Closed ichitaka closed 5 years ago
What version of scipy did you use?
Sorry i forgot that. Current version 1.1.0.
yes, looks like levene
can only handle 1d inputs, simplifying the code a bit, one the error is due to axis=0
when defining func
.
k = len(args)
Ni = zeros(k)
Yci = zeros(k, 'd')
func = lambda x: np.median(x, axis=0)
for j in range(k):
Ni[j] = len(args[j])
Yci[j] = func(args[j])
func(args[j])
has shape (10, ) for test_a, test_b, hence, cannot be assigned to the array element. axis=0
is used in other parts of the function as well.
the test is well-explained in one of the references, that should help to fix it
https://www.itl.nist.gov/div898/handbook/eda/section3/eda35a.htm
btw, I doubt that the results of bartlett
is correct. Shouldn't this give the same result?
stats.bartlett(test_a, test_b)
stats.bartlett(test_a.flatten(), test_b.flatten())
len(args[j])
gives 83 in the first case, but there are 830 elements. Or do I miss sth?
Shouldn't this give the same result?
No, it shouldn't. scipy.stats doesn't flatten anything unless axis=None.
I think in this case, the standard numpy broadcasting should apply. Each column of the first array is compared with the same column of the second array. The result should be vectorized across columns. If this is not the case, then the broadcasting in bartlett doesn't allow correctly for arrays with dimension greater than 1.
(you could check t-test to see whether and how it works there.)
If i understand the bartlett test correctly, it should either not allow my input or flatten it (in this case this should maybe be mention in documentation?).
AFAICS, this is more likely a bug, i.e. bartlett doesn't have consistent and correct behavior for 2-d inputs.
var
is missing an axis keyword
https://github.com/scipy/scipy/blob/8f81b305ab3562f295422a7a5fdc27eec0e5f776/scipy/stats/morestats.py#L1866
incomplete review of functions for "unexpected" inputs
If i understand the bartlett test correctly, it should either not allow my input or flatten it
In many cases it is useful to run a vectorized test on all columns at the same time., e.g. many mean comparisons in ttest_indep, so the default for vectorized operations in scipy.stats is axis=0
in contrast to numpy's default axis=None
and ravelling.
(In stats we don't want to ravel heterogeneous columns in a "DataFrame", e.g. what's the variance if column 0 is price and column 1 is quantity purchased.)
Given the current implementation, both functions only work for 1d input (bartlett
returns a result for 2d input as well but it is not correct).
There are multiple options to fix this:
var
, mean
to it (equivalent to axis=None) (this is the behaviour the user expects from the current doc I guess, requires minor adjustment of the code)While there is an axis keyword for the t-test, most other test don't have it (KS, wilcoxon, shapiro, anderson_ksample), so there is no consistent treatment. In particular, anderson_ksample
requires the k samples to be combined in a list instead of using *args
.
I think it's better to stick with axis=0 default in stats and raise otherwise.
I think many tests have most likely not been reviewed for multi-dimensional behavior.
to the items
So while i am capable of doing the bartlett test with two numpy-arrays of dimensions (83,10), i can't do the levene test with the exact same arrays. I receive the following error when calling stats.levene:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scipy/stats/morestats.py", line 1944, in levene Yci[j] = func(args[j]) ValueError: setting an array element with a sequence.
Code to reproduce: