Closed alistairewj closed 7 years ago
I agree that handling of NaNs could be improved (e.g. being explicit about whether or not they are included in the report), but I can't reproduce this particular issue. It may have been fixed since your comment (though the commit that's referenced above actually relates to a different issue).
e.g.:
import pandas as pd
import numpy as np
from tableone import TableOne
n = 10000
data_sample = pd.DataFrame(index=range(n))
mu, sigma = 10, 1
data_sample['normal'] = np.random.normal(mu, sigma, n)
# try np.nan
data_sample['a_nan'] = np.random.normal(mu, sigma, n)
data_sample['a_nan'].loc[1] = np.nan
# try adding list containing None
a_none = range(n)
a_none[1] = None
data_sample['a_none'] = a_none
TableOne(data_sample, continuous = ['normal', 'a_nan', 'a_none'])
Outputs:
Overall
overall
------------------- -----------------
n 10000
normal (mean (std)) 10.00 (1.00)
a_nan (mean (std)) 10.00 (1.01)
a_none (mean (std)) 5000.00 (2886.61)
Seems okay with grouping too:
data_sample['cats'] = np.random.choice(range(0, 2), n)
TableOne(data_sample, continuous = ['normal', 'a_nan', 'a_none'], strata_col= 'cats')
outputs:
Stratified by cats
0 1
------------------- ----------------- -----------------
n 4922 5078
normal (mean (std)) 10.02 (1.01) 9.98 (0.99)
a_nan (mean (std)) 10.01 (1.00) 9.98 (1.01)
a_none (mean (std)) 5085.98 (2900.32) 4916.65 (2871.08)
Trying categorical variable containing NaN:
data_sample['nyan_cats'] = np.random.choice(range(0, 2), n)
data_sample['nyan_cats'].loc[1] = None
TableOne(data_sample, continuous = ['normal', 'a_nan', 'a_none'],
categorical = ['cats'], strata_col= 'nyan_cats' )
Outputs:
Stratified by nyan_cats
0.0 1.0
------------------- ----------------- -----------------
n 4959 5040
normal (mean (std)) 9.99 (0.99) 10.01 (1.00)
a_nan (mean (std)) 10.00 (1.00) 9.99 (1.01)
a_none (mean (std)) 5036.44 (2885.80) 4964.15 (2887.24)
cats (n (%))
0 2524 (50.90) 2398 (47.58)
1 2435 (49.10) 2642 (52.42)
It seems this was fixed.
At the moment it seems that any NaN values will result in the output summary to be
nan (nan)
.