Open TalWac opened 2 years ago
AFAIK, that's a limitation about how our p-values are implemented in qsturng.
The latest version of scipy has a better implementation of the tukey-hsd distribution.
This might improve if you have both scipy and statsmodels at the latest version, but I have not tried yet.
correction the change to optionally delegate p-values to scipy is in main but not yet in a release AFAICS
pull request #8035
However:
I'll want to adjust with FDR (False discovery rate )for these values afterward. To do so, values that are much smaller the 0.001 are rounded to 0.001 and will not be significant after FDR.
tukey-hsd already corrects for multiple testing FWER. Why are you still using FDR p-value correction after that?
tukey-hsd already corrects for multiple testing FWER. Why are you still using FDR p-value correction after that?
You are right that it corrects for multiple testing. but only in a specific column in my example above it is called M2 (molecule #2). Since I have ~900 columns (molecules M1, M2, M3,.... M900), I'll need to correct for the number test I do, not only the number of pairwise comparisons.
As for https://github.com/statsmodels/statsmodels/pull/8035 not sure I understood how to combine scipy and statsmodels to have correct p-values
how to combine scipy and statsmodels to have correct p-values
you need to have the latest scipy release and the development version of statsmodels.
I found nightly version, but never tried those out https://anaconda.org/scipy-wheels-nightly/statsmodels
otherwise statsmodels main needs to be installed from github
I'm not yet on the latest scipy, but scipy stats has now also tukey-hsd, which you should be able to use https://docs.scipy.org/doc/scipy-1.8.0/html-scipyorg/reference/generated/scipy.stats.tukey_hsd.html
As for :
I found nightly version, but never tried those out https://anaconda.org/scipy-wheels-nightly/statsmodels otherwise statsmodels main needs to be installed from github
It was a bit confusing avouand did not try.
But the last option:
I'm not yet on the latest scipy, but scipy stats has now also tukey-hsd, which you should be able to use https://docs.scipy.org/doc/scipy-1.8.0/html-scipyorg/reference/generated/scipy.stats.tukey_hsd.html
Used the link above and p-values are much more accurate and more similar to those in the R package.
Thank you very much!!
However, the values in the lower and upper confidence interval have opposite values in tukey_hsd in scipy.stats comparing to R and pairwise_tukeyhsd in statsmodels.stats.multicomp
from scipy.stats import tukey_hsd
dat_EKVX = dat.loc[dat['Class_0']=='EKVX', 'M2']
dat_HOP62 = dat.loc[dat['Class_0']=='HOP62', 'M2']
dat_HOP92 = dat.loc[dat['Class_0']=='HOP92', 'M2']
dat_MPLT4 = dat.loc[dat['Class_0']=='MPLT4', 'M2']
dat_RPMI8226 = dat.loc[dat['Class_0']=='RPMI8226', 'M2']
res = tukey_hsd(dat_EKVX, dat_HOP62, dat_HOP92,dat_MPLT4,dat_RPMI8226)
print(res)
Tukey's HSD Pairwise Group Comparisons (95.0% Confidence Interval)
Comparison Statistic p-value Lower CI Upper CI
(0 - 1) 0.010 1.000 -0.726 0.746
(0 - 2) -0.287 0.781 -1.022 0.449
(0 - 3) 1.411 0.000 0.675 2.146
(0 - 4) 3.447 0.000 2.711 4.182
(1 - 0) -0.010 1.000 -0.746 0.726
(1 - 2) -0.297 0.760 -1.032 0.439
(1 - 3) 1.401 0.000 0.665 2.136
(1 - 4) 3.437 0.000 2.701 4.172
(2 - 0) 0.287 0.781 -0.449 1.022
(2 - 1) 0.297 0.760 -0.439 1.032
(2 - 3) 1.697 0.000 0.962 2.433
(2 - 4) 3.733 0.000 2.998 4.469
(3 - 0) -1.411 0.000 -2.146 -0.675
(3 - 1) -1.401 0.000 -2.136 -0.665
(3 - 2) -1.697 0.000 -2.433 -0.962
(3 - 4) 2.036 0.000 1.300 2.772
(4 - 0) -3.447 0.000 -4.182 -2.711
(4 - 1) -3.437 0.000 -4.172 -2.701
(4 - 2) -3.733 0.000 -4.469 -2.998
(4 - 3) -2.036 0.000 -2.772 -1.300
what do you mean with opposite values?
pair differences can go either way y0 - y1 or y1 - y0 your scipy results have both instead of just lower or upper triangle of all pairs
Describe the bug
Dear developer,
when I'm running pairwise_tukeyhsd or MultiComparison.tukeyhsd I see that the values in p-adj are bounded from below by 0.001 and from about by 0.9. I see that when I compare to R test TukeyHSD the values of the p-adj can be higher than 0.9 and smaller then 0.001.
This is a problem for me since I have many other of the column M# (i.e, M1, M2, ..., M900) and I'll want to adjust with FDR (False discovery rate )for these values afterward. To do so, values that are much smaller the 0.001 are rounded to 0.001 and will not be significant after FDR.
Code Sample python:
Code Sample R for the same data:
I have run these for different values of the Columns 'M#' (i.e, M1, ...,M900)
There is a difference in the values in the adj-pvalue but they are somewhat close between R and python, however, in statsmodels they are always bounded by 0.001 and 0.9.
is it possible to change it?
Kindly help,
Tal