tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
161 stars 38 forks source link

Option to display data for binary variables in only a single row #59

Closed christian-geier closed 4 years ago

christian-geier commented 6 years ago

For variables that are binary/boolean such as smoker (Y/N), ascites (Y/N), ... etc. I currently enter it as a category. This results in two rows:

Variable Group1 n=10 Group2 n=10
Smoker 0 5 (25%)
1 15 (75%)
Ascites 0 17 (85%)
1 3 (15%)

etc...

The desired output would only include the positive features:

Variable Group1 n=10 Group2 n=10
Smoker 1 15 (75%)
Ascites 1 3 (15%)

The reason being, if there are only two groups and they are mutually exclusive, it is very easy (in fact may be easier) for the reader to understand the data with a single row

tompollard commented 6 years ago

Also requested by @theonesp - we'll try to implement soon!

tompollard commented 4 years ago

Sorry for the delay in getting to this. The limit and order arguments can now (from v0.6.6) be combined to display binary data a single row:

Example below:

# import libraries
from tableone import TableOne
import pandas as pd

# load sample data into a pandas dataframe
url="https://raw.githubusercontent.com/tompollard/tableone/master/data/pn2012_demo.csv"
data=pd.read_csv(url)

# columns to summarize
columns = ['Age', 'SysABP', 'death']

# columns containing categorical variables
categorical = ['death']

# non-normal variables
nonnormal = ['Age']

# limit the binary variable "death" to a single row
limit = {"death": 1}

# set the order of the categorical variables
order = {"death": ["1"]}

# alternative labels
labels={'death': 'Mortality'}

# set decimal places for age to 0
decimals = {"Age": 0}

# create tableone with the input arguments
mytable = TableOne(data, columns=columns, categorical=categorical, 
                   nonnormal=nonnormal, rename=labels, label_suffix=True, 
                   decimals=decimals, limit=limit, order=order)

print(mytable.tabulate(tablefmt = "github"))
Missing Overall
n 1000
Age, median [Q1,Q3] 0 68 [53,79]
SysABP, mean (SD) 291 114.3 (40.2)
Mortality, n (%) 1 0 136 (13.6)
christian-geier commented 4 years ago

Awesome, I'll try this out soon - thanks for keeping to improve this !