tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
164 stars 41 forks source link

Continuous describe can now handle mixed datatypes #48

Closed alistairewj closed 6 years ago

alistairewj commented 6 years ago

Changes

tompollard commented 6 years ago

I'd like to merge most of this but want to avoid the cherry-pick again because of the problems it caused last time! One of the commits - I think 5dff45a - seems to throw out the numbers in the null column and raises a warning when the grouped table is created:

# create an instance of TableOne with the input arguments
grouped_table = TableOne(data, columns, categorical, groupby, nonnormal)
/usr/local/lib/python3.6/site-packages/pandas/core/indexes/api.py:87: 
RuntimeWarning: '<' not supported between instances of 'str' and 'float', 
sort order is undefined for incomparable objects
  result = result.union(other)

Instead of coercing string-containing fields to numbers, maybe we should improve data type checks on the input? e.g. if a column contains non-numerical values but it is not specified as categorical, we could fail with:

<column> does not appear to be numerical. Either specify as a categorical 
variable or remove the non-numerical values

I think I prefer an explicit fail because it is possible that coerces may cause numbers to be misreported in certain cases. e.g. when commas are used as a separator for thousands:

"900"
"500"
"1,500"
"100"

mean is reported as 500 instead of 750. Similar issues might come up with commas a decimal separator (e.g. "1,5" for 1.5), fractions (e.g. 1/2 for 0.5), etc.

alistairewj commented 6 years ago

Now explicitly fails and fixed a bug for counting null values!