tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
164 stars 41 forks source link

Allow `None` as input for categorical/continuous #5

Closed alistairewj closed 6 years ago

alistairewj commented 7 years ago

Instead of requiring an empty list, it would seem appropriate to allow the continuous and categorical inputs to be None. At the moment this will throw an error. Example:

import pandas as pd
import numpy as np
from tableone import TableOne

n = 10000
data_sample = pd.DataFrame(index=range(n))

mu, sigma = 10, 1
data_sample['normal'] = np.random.normal(mu, sigma, n)
data_sample['nonnormal'] = np.random.noncentral_chisquare(20,nonc=2,size=n)

TableOne(data_sample, continuous = ['normal'], categorical = None)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-d1c5f121e36f> in <module>()
     10 data_sample['nonnormal'] = np.random.noncentral_chisquare(20,nonc=2,size=n)
     11 
---> 12 TableOne(data_sample, continuous = ['normal'], categorical = None)

/home/alistairewj/git/tableone/tableone.py in __init__(self, data, continuous, categorical, strata_col, nonnormal, pval)
     35         if nonnormal and type(nonnormal) == str:
     36             nonnormal = [nonnormal]
---> 37 
     38         self.__check_input_arguments_for_overlap(continuous,categorical,'continuous','categorical')
     39         self.__check_input_arguments_in_df(data.columns,continuous+categorical+nonnormal)

/home/alistairewj/git/tableone/tableone.py in __check_input_arguments_for_overlap(self, a, b, a_name, b_name)
     93         """
     94         Check the input argument for duplicate columns
---> 95         """
     96         if bool(set(a) & set(b)):
     97             overlap = [val for val in a if val in b]

TypeError: 'NoneType' object is not iterable
tompollard commented 7 years ago

We can implement this, but is there a reason why you wouldn't just omit the categorical argument altogether?

e.g.

# only continuous variables 

import pandas as pd
import numpy as np
from tableone import TableOne

n = 10000
data_sample = pd.DataFrame(index=range(n))

mu, sigma = 10, 1
data_sample['normal'] = np.random.normal(mu, sigma, n)
data_sample['nonnormal'] = np.random.noncentral_chisquare(20,nonc=2,size=n)

TableOne(data_sample, continuous = ['normal', 'nonnormal'], nonnormal=['nonnormal'])

Outputs:

Overall
                          overall
------------------------  -------------------
n                         10000
normal (mean (std))       10.00 (1.00)
nonnormal (median [IQR])  21.23 [17.04,26.30]