Closed shahzeb1 closed 6 years ago
Can you give an example of where it does calculate a p-value? That function looks to return pval = np.nan
.
I'm not sure I follow what you mean.
Are you asking about the source code for tableone or are you asking me about how I'm using tableone?
In the line you highlight, it returns pval
, which is set to np.nan two lines above. So I was wondering where it was that you found a pval being computed. Is it just not clear that in one of the many comparisons it makes, a pval wasn't computed?
OK gotchu, I see that now.
I guess what I'm asking is: why is this warning being raised.
Is it because there is a lack of values for a specific group?
Here is the HTML output:
I'm just trying to better understand that p-test warning and see how to get rid of it / fix the data.
This warning I think is different than the one cited in original post.
This one indicates that an asymptotic test like Pearson's Chi-squared isn't a good choice here, because the cell counts are too small (looks primarily to be from missing data -- I can't see the bottom of the language variables -- does it have any observed in the suicide=1 group?).
I believe we have Fisher's exact test implemented, but it's only for the 2x2 case. Here it looks like you are kx2.
Ah I see, I thought the warning was referencing the same thing.
Here is a link to the HTML file: https://cdn.rawgit.com/CrivelliLab/structured_suicide/7e5d74d0/output/basic_stats_patient_info.html
Yeah that's what confused me because even though the terminal warns that No p-value was computed
(referring to the originally cited line of code), it still ends up generating some sort of p-value.
I'm just trying to figure out what to do with the data so the p-value doesn't raise any warning and can do a more sensible calculation.
So this looks like pretty sparse data.
If this is MIMIC data, I usually collapse some of these groups, so you end up with a sufficient number in each.
Alternatively, you can set pval=False
. It's unclear the objective of this study, so this may be reasonable if you're simply trying to summarise the data.
Yes it is MIMIC.
Yes, I saw the pval
option, and I just wanted to use tableone's p-value test. Seemed like a neat option, but it might not be fit for my use-case.
So due to the sparsity of the second group, it's basically unable to accurately generate the tests?
Also one general question: Do the p-value tests look at each row by row comparing the two groups or do the tests look at an entire column and compare it to the second group's column. I assume it's the latter.
Thank you for your quick responses by the way, really appreciate it.
For sparse categorical data, it's generally not a great idea to use Pearson's chi-squared test. Fisher's exact test is an alternative, but we have not implemented the general (mxn) case yet. I think were are hoping to do so in the future, but don't have an ETA.
Yes, it looks at the latter, and outputs one p-value.
Thank you for the help.
https://github.com/tompollard/tableone/blob/1e920cdeaa44306379cd942a8fdabfea849e1fb4/tableone.py#L557
I feel like the warning being shown here is vague especially because despite saying that "No p-value was computed" it does end up computing a p-value.
If you explain why this warning is being generated, I would be more than happy to work on a PR which gives a more appropriate and verbose warning.