raphaelvallat / pingouin

Statistical package in Python based on Pandas
https://pingouin-stats.org/
GNU General Public License v3.0
1.61k stars 138 forks source link

output of normality is text not bool when testing fewer than 4 samples #373

Closed dalensis closed 11 months ago

dalensis commented 1 year ago

Hi, thank you for the exceptional library you developed. The output of the normality function is not a bool when using less than 4 replicates per sample. the test return a warning and a NAN in the pvalue. It still write false in the the normal column, but now it's a string value and not a bool.

Bests

raphaelvallat commented 1 year ago

Hi @dalensis,

Thanks for opening the issue! Can you please provide minimal code to reproduce the issue? Thanks!

dalensis commented 1 year ago

Here it is:

from pingouin import normality
import seaborn as sns

data = sns.load_dataset("penguins")[:3]
print (data)

normal = normality(data, dv=data.columns[2], group=data.columns[1]) #test normality
print(normal)

if normal["normal"].all():
    print("ALL TRUE test bool")
else:
    print("at least 1 false test bool")

if  all(map(lambda ele: str(ele).lower().capitalize() == "True",normal["normal"])): #sometimes the result of normal is a str!
    print("ALL TRUE test string")        

else: 
    print("At least 1 false test string")

OUTPUT:

species island bill_length_mm ... flipper_length_mm body_mass_g sex 0 Adelie Torgersen 39.1 ... 181.0 3750.0 Male 1 Adelie Torgersen 39.5 ... 186.0 3800.0 Female 2 Adelie Torgersen 40.3 ... 195.0 3250.0 Female

[3 rows x 7 columns]

W pval normal island
Torgersen NaN NaN False

ALL TRUE test bool At least 1 false test string

Python311\Lib\site-packages\pingouin\distribution.py:242: UserWarning: Group Torgersen has less than 4 valid samples. Returning NaN. warnings.warn(f"Group {idx} has less than 4 valid samples. Returning NaN.")

Comment: In bold the results of the two tests. When the output is a list of string the test for all TRUE boolean values gives True, because the values are not 0.

raphaelvallat commented 1 year ago

Thank you. You are right that Pingouin incorrectly returns False as a string:

https://github.com/raphaelvallat/pingouin/blob/7923141161564b7a065b75f44f5fc75a2c1a1aa2/pingouin/distribution.py#L244

Actually, I think it might make more sense to return either np.nan or a nullable boolean.

If you'd like, please feel free to submit a PR to fix this behavior. Thanks