tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
161 stars 38 forks source link

NaN rows deleted in categorical option #79

Closed Ryo-F closed 4 years ago

Ryo-F commented 5 years ago

Hi, I've been using Tableone til I ran into this problem,

When using Tableone(categorical=categorical_columns), if categorical_columns contains more than two columns and each of them contains NaN rows, some rows would be deleted due to dropna() in tableone.py#L468

I've created a PR to fix this problem, please take a look!

tompollard commented 5 years ago

Thanks for picking this up Ryo, we'll take a look!

tompollard commented 5 years ago

To reproduce:

import random
random.seed(1)
from tableone import TableOne

fruit = ['apple','banana','orange','pineapple','lemon','durian','peach']
n = 4
fruit = [random.sample(fruit, n),
        random.sample(fruit, n),
        random.sample(fruit, n),
        random.sample(fruit, n),
        random.sample(fruit, n),
        random.sample(fruit, n),
        random.sample(fruit, n)]
df = pd.DataFrame(fruit)
df.columns = ['basket1','basket2','basket3','basket4']
df

t1 = TableOne(df, categorical = ['basket1','basket2','basket3','basket4'])
t1

df.loc[1:3,'basket2'] = None
df.loc[2:4,'basket3'] = None

t2 = TableOne(df, categorical = ['basket1','basket2','basket3','basket4'])
t2

In t2, rows with the nulls are missing.

tompollard commented 5 years ago

Thanks again @Ryo-F. Fixed in version 0.6.0! I'll keep this issue open until we have added some tests to https://github.com/tompollard/tableone/blob/master/test_tableone.py.