maximtrp / scikit-posthocs

Multiple Pairwise Comparisons (Post Hoc) Tests in Python
https://scikit-posthocs.rtfd.io
MIT License
344 stars 40 forks source link

Dunn and missing values #70

Open stikpet opened 4 months ago

stikpet commented 4 months ago

I think the function for Dunn still counts as sample size the number of scores in the categorical field, even if there is no value in the numerical field, i.e. it includes missing values. I don't think this is correct....

maximtrp commented 4 months ago

Handling missing values is out of scope of this package. But something should be done with it, I agree. Maybe throwing a warning will be enough... Anyway, thank you for drawing attention to it.

stikpet commented 4 months ago

a simple dropna() at the beginning should be enough to fix things, or indeed a warning about them.

maximtrp commented 4 months ago

Dropping something silently is not a good thing. I will think about it. Maybe let's have a look at some references, bigger packages with millions of users.

stikpet commented 4 months ago

I'm not familiar with other packages in Python that can perform the test. In R however there is dunn.test from the library dunn.test that doesn't give any warnings and simply removes the missing values. Another R library FSA has a dunnTest function that does add a warning "Some rows deleted from 'x' and 'g' because missing data". A little program from IBM named SPSS Statistics also does not give any warnings and simply removes the missing values in the calculations.

Thanks for still answering on this and of course for sharing your library.