Closed stikpet closed 1 month ago
Handling missing values is out of scope of this package. But something should be done with it, I agree. Maybe throwing a warning will be enough... Anyway, thank you for drawing attention to it.
a simple dropna() at the beginning should be enough to fix things, or indeed a warning about them.
Dropping something silently is not a good thing. I will think about it. Maybe let's have a look at some references, bigger packages with millions of users.
I'm not familiar with other packages in Python that can perform the test. In R however there is dunn.test from the library dunn.test that doesn't give any warnings and simply removes the missing values. Another R library FSA has a dunnTest function that does add a warning "Some rows deleted from 'x' and 'g' because missing data". A little program from IBM named SPSS Statistics also does not give any warnings and simply removes the missing values in the calculations.
Thanks for still answering on this and of course for sharing your library.
Fixed in v0.10.0
I think the function for Dunn still counts as sample size the number of scores in the categorical field, even if there is no value in the numerical field, i.e. it includes missing values. I don't think this is correct....