tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
161 stars 38 forks source link

How to directly get all the adjusted P-values when performing multiple tests? #99

Closed Brandon96-lab closed 4 years ago

Brandon96-lab commented 4 years ago

Thanks for your nice work on this python package! It's really convenient and I'm regrettable that I don't find it earlier!

What I wonder to know is that when performing multiple tests, is there a way to get all adjusted P-values at one time? The code I have tried was as this: TableOne(data,columns = columns, categorical = categorical, nonnormal=nonnormal, groupby=groupby, pval_adjust='bonferroni', label_suffix=True, pval=True, htest_name=True) Three subgroups' differences were interesting, but I find it only returning one-column P values.

Or did I need to change the data and run the code several times (each time two sub-groups)?

Thank you! @tompollard @alistairewj

tompollard commented 4 years ago

When you set pval=True, we compute a single P-value regardless of how many levels there are in groupby. For example:

from scipy import stats
import pandas as pd
from tableone import TableOne  

x = [1, 1, 1]
y = [2, 2, 2]
z = [2, 2, 7]

# add these values to a DataFrame
df = pd.DataFrame({"letter": ["x","x","x",
                              "y","y","y",
                              "z","z","z"],
                   "number": x+y+z})

# add a second column with the same values
df['number2'] = df.number 

print(df)

#   letter  number  number2
# 0      x       1        1
# 1      x       1        1
# 2      x       1        1
# 3      y       2        2
# 4      y       2        2
# 5      y       2        2
# 6      z       2        2
# 7      z       2        2
# 8      z       7        7

t1 = TableOne(df, groupby="letter", pval=True, htest_name=True, 
              overall=False, missing=False, pval_adjust=False)

print(t1.tabulate(tablefmt="github"))   

Outputs:

x y z P-Value Test
n 3 3 3
number, mean (SD) 1.0 (0.0) 2.0 (0.0) 3.7 (2.9) 0.221 One-way ANOVA
number2, mean (SD) 1.0 (0.0) 2.0 (0.0) 3.7 (2.9) 0.221 One-way ANOVA

... which is the equivalent of running the following test:

# https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html

# The one-way ANOVA tests the null hypothesis that two or more groups have the same
# population mean. The test is applied to samples from two or more groups, possibly with
# differing sizes.

stats.f_oneway(x,y,z)

>>> pvalue=0.22126806334127747

If we want to adjust for multiple comparisons, then we can set the pval_adjust argument as shown below:

t2 = TableOne(df, groupby="letter", pval=True, htest_name=True, 
              overall=False, missing=False, pval_adjust="bonferroni")

print(t2.tabulate(tablefmt="github")) 
x y z P-Value (adjusted) Test
n 3 3 3
number, mean (SD) 1.0 (0.0) 2.0 (0.0) 3.7 (2.9) 0.443 One-way ANOVA
number2, mean (SD) 1.0 (0.0) 2.0 (0.0) 3.7 (2.9) 0.443 One-way ANOVA

In this case, the p-values are essentially the p-value reported by stats.f_oneway(x,y,z) multiplied by 2 (the number of tests).

 _, p = stats.f_oneway(x,y,z)
p*2

>>> 0.44253612668255493

Do you expect to see something different? Are you looking for something more like the pairwise comparisons that are done for standardized mean differences?

t3 = TableOne(df, groupby="letter", pval=True, htest_name=True,  
              overall=False, missing=False, smd=True)

print(t3.tabulate(tablefmt="github")) 
x y z P-Value Test SMD (y,z) SMD (x,y) SMD (x,z)
n 3 3 3
number, mean (SD) 1.0 (0.0) 2.0 (0.0) 3.7 (2.9) 0.221 One-way ANOVA 0.816 inf 1.306
number2, mean (SD) 1.0 (0.0) 2.0 (0.0) 3.7 (2.9) 0.221 One-way ANOVA 0.816 inf 1.306
Brandon96-lab commented 4 years ago

Thank you for your kindly help!

I still have some questions about this.

  1. As you said, the P-Value (adjusted) = P-Value 2, but there are three groups. Why here “2” but not “*3” ?
  2. I just think when three groups were compared, the overall P-Value (adjusted) which you have shown above is meaningless. When performing multiple tests, like 'x' VS 'y', 'x' VS 'z', 'y' VS 'z', the Bonferroni adjusted P-values is meaningful.
  3. We have known that Python is not good at hypothesis testing compared with SPSS or R, or in other words, the results derived from Python to be published in authorized magazines should be verified again by other statistical software like SPSS or R. Is it right? But I think the P-value given directly by Python could provide us with a reference. So if the package "TableOne" could provide results of the difference between each sub-group (i.e., the adjusted P-values of 'x' VS 'y', 'x' VS 'z', 'y' VS 'z'), I think it will bring us more convenience.

My words might be confused and not professional. If my opinions or questions are not right, please corrected them without hesitation. It's very kind of you to help me.

tompollard commented 4 years ago

Thanks @Brandon96-lab. I suggest talking through these questions with a statistician, but I've tried to briefly respond to your points below:

As you said, the P-Value (adjusted) = P-Value 2, but there are three groups. Why here “2” but not “*3” ?

The Bonferroni correction is attempting to compensate for the increased chance of observing a rare event when you make multiple tests. In this case we made two tests, and applied a correction factor of 2.

I just think when three groups were compared, the overall P-Value (adjusted) which you have shown above is meaningless.

I cannot tell you whether or not the p-value (or adjusted p-value) is meaningful. That's something that you would need to think about, based on your understanding of your data and the methods involved.

We have known that Python is not good at hypothesis testing compared with SPSS or R, or in other words, the results derived from Python to be published in authorized magazines should be verified again by other statistical software like SPSS or R. Is it right?

1 + 1 = 2 in whichever language you choose, but it is true that SPSS and R are popular with statisticians and it definitely wouldn't hurt to double check your findings in other packages/languages. We will look into support for pairwise comparisons.

Brandon96-lab commented 4 years ago

Thanks a lot. I have read your comments carefully which I've learned much from. But I couldn't understand why in this case we made two tests. I've read the source codes of the "tableone" package, and when computing adjusted P-values, it runs as below: # correct for multiple testing if self._pval and self._pval_adjust: alpha = 0.05 adjusted = multitest.multipletests(self._htest_table['P-Value'], alpha=alpha, method=self._pval_adjust) self._htest_table['P-Value (adjusted)'] = adjusted[1] self._htest_table['adjust method'] = self._pval_adjust

We have to deliver P-value to the first parameter. For example, we input: from statsmodels.stats import multitest multitest.multipletests([0.01,0.02,0.03],alpha=0.05, method='b')[1]

(The P-values inputted were calculated before.)

Then it will output: array([0.03, 0.06, 0.09])

And it equals to [0.01*3, 0.02*3, 0.03*3].

If we input two P-values, like this: multitest.multipletests([0.01,0.02],alpha=0.05, method='b')[1] Then it will output: array([0.02, 0.04])

Then if we input just one P-value, like this: multitest.multipletests([0.01],alpha=0.05, method='b')[1] Then it will output: array([0.01])

We have known that the "Adjusted P-value" = "P-value" * "the number of P-value we inputted".

But when we use TableOne, no matter how many groups in the analysis,
the "Adjusted P-value" = "P-value" 2. I mean if there are two, four, five or more groups, the "Adjusted P-value" = "P-value" 2 , and it won't be changed.

It seems that by default, there are two tests that were made, no matter how many groups were there. I looked up the source codes and still couldn't understand why here "*2".

I feel confused about this. And thanks again for your patience.

jraffa commented 4 years ago

@Brandon96-lab

I think some confusion may be related to what you think is being adjusted.

I took the example from the README.md in this repo and changed the groupby argument to "ICU".

mytable = TableOne(data, columns=columns, categorical=categorical, groupby="ICU", nonnormal=nonnormal, rename=labels, pval=True,missing=False,overall=False)
print(mytable.tabulate(tablefmt = "html"))
CCU CSRU MICU SICU P-Value
n 162 202 380 256
Age, median [Q1,Q3] 76.0 [63.0,82.0]68.0 [60.0,77.0]65.0 [51.0,79.0]63.5 [49.0,77.2]<0.001
SysABP, mean (SD) 115.5 (35.8) 118.5 (21.0) 102.4 (47.1) 120.5 (46.6) <0.001
Height, mean (SD) 173.6 (31.5) 168.3 (15.9) 168.6 (26.9) 171.3 (12.3) 0.187
Weight, mean (SD) 79.9 (23.2) 85.8 (22.5) 82.9 (25.1) 81.6 (23.0) 0.237
ICU, n (%) CCU 162 (100.0) <0.001
CSRU 202 (100.0)
MICU 380 (100.0)
SICU 256 (100.0)
mortality, n (%) 0 137 (84.6) 194 (96.0) 318 (83.7) 215 (84.0) <0.001
1 25 (15.4) 8 (4.0) 62 (16.3) 41 (16.0)

I will note a couple things:

  1. There are four groups in the ICU variable, but one p-value per variable (rows).
  2. There are six variables (rows), and six p-values (including ICU which is in the rows and columns).

If we now apply the adjustment:

mytable2 = TableOne(data, columns=columns, categorical=categorical, groupby='ICU', nonnormal=nonnormal, rename=labels, pval=True,missing=False,overall=False, pval_adjust='b')
print(mytable2.tabulate(tablefmt = "html"))
CCU CSRU MICU SICU P-Value (adjusted)
n 162 202 380 256
Age, median [Q1,Q3] 76.0 [63.0,82.0]68.0 [60.0,77.0]65.0 [51.0,79.0]63.5 [49.0,77.2]<0.001
SysABP, mean (SD) 115.5 (35.8) 118.5 (21.0) 102.4 (47.1) 120.5 (46.6) <0.001
Height, mean (SD) 173.6 (31.5) 168.3 (15.9) 168.6 (26.9) 171.3 (12.3) 1.000
Weight, mean (SD) 79.9 (23.2) 85.8 (22.5) 82.9 (25.1) 81.6 (23.0) 1.000
ICU, n (%) CCU 162 (100.0) <0.001
CSRU 202 (100.0)
MICU 380 (100.0)
SICU 256 (100.0)
mortality, n (%) 0 137 (84.6) 194 (96.0) 318 (83.7) 215 (84.0) 0.001
1 25 (15.4) 8 (4.0) 62 (16.3) 41 (16.0)

You will note there are still six p-values, and some of them have changed. The adjustment is applying the adjustment on the basis that there were six tests done, and although it is not quite clear from this table, these are simply the previous table's p-values 6, as there were six p-values total computed in the previous table. I would assume that you were seeing `2` in the cases you were exploring because you had 2 variables in the rows in your tables.

We will try to make this fact a little more clear in the documentation.

tompollard commented 4 years ago

Just noting here that Jesse updated the documentation for clarity: https://github.com/tompollard/tableone/pull/100

Brandon96-lab commented 4 years ago

Thanks a lot! It's much clear. Indeed, I misunderstood this as post hoc tests. Thanks again for your great work on this package which certainly saved us lots of time.