Vagueness in p-value warning

tompollard / tableone

Create "Table 1" for research papers in Python

https://pypi.python.org/pypi/tableone/

MIT License

161 stars 38 forks source link

Vagueness in p-value warning #70

Closed shahzeb1 closed 6 years ago

shahzeb1 commented 6 years ago

https://github.com/tompollard/tableone/blob/1e920cdeaa44306379cd942a8fdabfea849e1fb4/tableone.py#L557

I feel like the warning being shown here is vague especially because despite saying that "No p-value was computed" it does end up computing a p-value.

If you explain why this warning is being generated, I would be more than happy to work on a PR which gives a more appropriate and verbose warning.

alistairewj commented 6 years ago

Can you give an example of where it does calculate a p-value? That function looks to return pval = np.nan.

shahzeb1 commented 6 years ago

I'm not sure I follow what you mean.

Are you asking about the source code for tableone or are you asking me about how I'm using tableone?

alistairewj commented 6 years ago

In the line you highlight, it returns pval, which is set to np.nan two lines above. So I was wondering where it was that you found a pval being computed. Is it just not clear that in one of the many comparisons it makes, a pval wasn't computed?

shahzeb1 commented 6 years ago

OK gotchu, I see that now.

I guess what I'm asking is: why is this warning being raised.

Is it because there is a lack of values for a specific group?

Here is the HTML output:

I'm just trying to better understand that p-test warning and see how to get rid of it / fix the data.

jraffa commented 6 years ago

This warning I think is different than the one cited in original post.

This one indicates that an asymptotic test like Pearson's Chi-squared isn't a good choice here, because the cell counts are too small (looks primarily to be from missing data -- I can't see the bottom of the language variables -- does it have any observed in the suicide=1 group?).

I believe we have Fisher's exact test implemented, but it's only for the 2x2 case. Here it looks like you are kx2.

shahzeb1 commented 6 years ago

Ah I see, I thought the warning was referencing the same thing.

Here is a link to the HTML file: https://cdn.rawgit.com/CrivelliLab/structured_suicide/7e5d74d0/output/basic_stats_patient_info.html

Yeah that's what confused me because even though the terminal warns that No p-value was computed (referring to the originally cited line of code), it still ends up generating some sort of p-value.

I'm just trying to figure out what to do with the data so the p-value doesn't raise any warning and can do a more sensible calculation.

jraffa commented 6 years ago

So this looks like pretty sparse data.

If this is MIMIC data, I usually collapse some of these groups, so you end up with a sufficient number in each.

Alternatively, you can set pval=False. It's unclear the objective of this study, so this may be reasonable if you're simply trying to summarise the data.

shahzeb1 commented 6 years ago

Yes it is MIMIC.

Yes, I saw the pval option, and I just wanted to use tableone's p-value test. Seemed like a neat option, but it might not be fit for my use-case.

So due to the sparsity of the second group, it's basically unable to accurately generate the tests?

Also one general question: Do the p-value tests look at each row by row comparing the two groups or do the tests look at an entire column and compare it to the second group's column. I assume it's the latter.

Thank you for your quick responses by the way, really appreciate it.

jraffa commented 6 years ago

For sparse categorical data, it's generally not a great idea to use Pearson's chi-squared test. Fisher's exact test is an alternative, but we have not implemented the general (mxn) case yet. I think were are hoping to do so in the future, but don't have an ETA.

Yes, it looks at the latter, and outputs one p-value.

shahzeb1 commented 6 years ago

Thank you for the help.