Open alejandroschuler opened 4 years ago
multinomial.rvs() calls random_state.multinomial(n, p, size)
https://github.com/scipy/scipy/blob/7e3cda70f5448bc2afa019d7b49533eaba33090e/scipy/stats/_multivariate.py#L3218-L3220
random_state.multinomial
doesn't appear to deal with the multi-dimensional pmulti
. I.e. It returns only the values corresponding to the first row of pmulti
pmulti = np.array([[0,1,0],[1,0,0],[0,0,1]])
random_state = np.random.mtrand._rand
random_state.multinomial(1, pmulti, None)
# array([0, 1, 0])
random_state.multinomial(1, pmulti[0], None)
# array([0, 1, 0])
random_state.multinomial(1, pmulti[2], None)
# array([0, 0, 1])
# What you would like is the equivalent of
np.array([rs2.multinomial(1, _p, None) for _p in pmulti])
# array([[0, 1, 0],
# [1, 0, 0],
# [0, 0, 1]])
The multinomial docstring is https://github.com/scipy/scipy/blob/7e3cda70f5448bc2afa019d7b49533eaba33090e/scipy/stats/_multivariate.py#L2949-L2952
It gives a list of some methods that support broadcasting and rvs
is not on that list. There is a comment in the original PR https://github.com/scipy/scipy/pull/6375#issuecomment-233080752 that suggests this issue was known at the time.
@pvanmulbregt thanks for looking into it! I didn't assume this was a bug per se, but I wanted to point out that regardless of the reason, the behavior of .rvs is inconsistent in this case in a way that makes building generic packages on top of scipy.stats difficult. It's especially weird now that I know that .pmf actually does support broadcasting! :)
@alejandroschuler would you be interested in submitting a PR with a fix?
FYI: in one of the examples presented np.random.Generator
appears to broadcast. Is this the correct behaviour?
>>> rg = np.random.default_rng()
>>> rg.multinomial(n=np.array([1, 2]), pvals=np.array([[0, 0, 1], [1, 0, 0]]))
array([[0, 1],
[0, 2]])
Upon further investigation, multinomial
is not taking the first row.
When passed an array of shape (m,p)
, it decides the number of possible values is m
(not p
), and takes the first (m-1)
elements of the array and uses those. This selection may cross rows. E.g.
In[86]: pmulti = np.array([[0,1,0],[0.5,0,0.5],[0,0,1], [0.1, 0.1, 0.8], [0.2, 0.2, 0.6]])
In [87]: pmulti
Out[87]:
array([[0. , 1. , 0. ],
[0.5, 0. , 0.5],
[0. , 0. , 1. ],
[0.1, 0.1, 0.8],
[0.2, 0.2, 0.6]])
In [88]: np.shape(pmulti)
Out[88]: (5, 3)
In [89]: rng.multinomial(20, pmulti)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-85-541ce7a1994a> in <module>
----> 1 rng.multinomial(20, pmulti)
generator.pyx in numpy.random.generator.Generator.multinomial()
ValueError: sum(pvals[:-1]) > 1.0
The error occurs because it treated the (5,3)
array as an array of length 5, and used 3 values from row 0, and 2 values from row 1. It summed the first 4 elements of [0, 1, 0, 0.5, 0, 0.5, ..., 0.2, 0.2, 0.6] to get 1.5 which is bigger than 1, so raised an Exception.
@ev-br I'm not at all familiar with the scipy.stats codebase so I'm probably not the right person. It looks like another issue has been raised (https://github.com/numpy/numpy/issues/15492) that's more specific so hopefully folks will narrow in on the issue and resolve it there.
The reason I bring this up is because I'm one of the authors of the newly-popular NGBoost package. Right now, the process for developing a new distribution to add to NGBoost is not as user-friendly as I'd like it to be. In my ideal world, it would be as easy as pointing to an implementation in scipy.stats, but I can't quite get there because of some inconsistencies and my unfamiliarity with some necessary internals.
By any chance do you know if there someone who is an involved contributor to scipy.stats that might be willing to email or chat with me about a tighter integration between NGBoost and scipy.stats?
It appears that
stats.scipy.multinomial
does not support vectorized operations in a way that is consistent with other scipy.stats distributions that I have used:It also seems that
n
can't be vectorized:returns an error, whereas
does exactly what one would expect.