tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
159 stars 38 forks source link

TypeError: '<' not supported between instances of 'str' and 'int' when setting pval=True #160

Open erdie721 opened 4 months ago

erdie721 commented 4 months ago

I'm getting the following error whenever I set pval=True on a data set. Using Jupyter Notebook via Anaconda running python 3.9


TypeError Traceback (most recent call last) /tmp/ipykernel_2037601/677072381.py in ----> 1 results = TableOne(data = df, columns = columns, groupby = groupby, nonnormal = nonnormal, categorical = categorical,pval=True)

~/anaconda3/lib/python3.9/site-packages/tableone/tableone.py in init(self, data, columns, categorical, groupby, nonnormal, min_max, pval, pval_adjust, htest_name, pval_test_name, htest, isnull, missing, ddof, labels, rename, sort, limit, order, remarks, label_suffix, decimals, smd, overall, row_percent, display_all, dip_test, normal_test, tukey_test) 386 # forgive me jraffa 387 if self._pval: --> 388 self._htest_table = self._create_htest_table(data) 389 390 # correct for multiple testing

~/anaconda3/lib/python3.9/site-packages/tableone/tableone.py in _create_htest_table(self, data) 1113 # if categorical, create contingency table 1114 elif is_categorical: -> 1115 catlevels = sorted(data[v].astype('category').cat.categories) 1116 cross_tab = pd.crosstab(data[self._groupby]. 1117 rename('_groupbyvar'), data[v])

TypeError: '<' not supported between instances of 'str' and 'int'

ExtremeCoolDude commented 2 months ago

Can anyone help understanding this issue please ? I'm having the same issue.

The values are stratified, and are numeric, I don't understand where this issue is coming from ?

image

tompollard commented 1 month ago

Sorry for the delay, will work on bug fixes this week!

tompollard commented 1 month ago

I'm not quite sure what's going here, but I think it's a data type issue (one of your columns appears to contain a mix of strings and numbers). Are you able to share a dataset that can be used to reproduce the error?

erdie721 commented 1 month ago

Yeah, I think some of the columns had an entry with a < or > on some of the numbers. Is there any way to just exempt those columns from having a p-value calculated? Or having the error be slightly more descriptive of the issue?

tompollard commented 1 month ago

Yeah, I think some of the columns had an entry with a < or > on some of the numbers. Is there any way to just exempt those columns from having a p-value calculated? Or having the error be slightly more descriptive of the issue?

Yes definitely, I'll have a think about how best to handle this.