tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
164 stars 41 forks source link

Unable to treat variable as continuous measure #109

Open ghost opened 3 years ago

ghost commented 3 years ago

Hello! Would just like to say fantastic package and great syntax for the function.

I seem to be having an issue with creating a table with continuous values. I'm sure I am probably doing something incorrectly on my end since it is basic functionality. When I try to do an easy example with a single continuous variable I get an output like below:

image

It is odd because clearly it is reading it as non-normal as I have specified (as indicated by the 'median [Q1, Q3]) but it seems to only give counts and frequencies, essentially treating it as categorical. I have also verified that the variable is of type float64. Is there any suggestions on how I can proceed and have it treat it as a continuous measure?

Thanks in advance

tompollard commented 3 years ago

Hi @sgummidipundi, you've raised a good point, which is that there is no "continuous" argument. At the moment, tableone expects you to define the categorical variables using the "categorical" argument. Anything else is then treated as continuous. I can see how this is confusing, especially when (as in your case) there are no categorical variables.

If you don't specify which variables are categorical, then then tableone attempts to guess (and, from your example, clearly doesn't do a great job!). In your example, you would need to provide an empty categorical argument. I've tried to recreate the example below:

1. Generate sample data

# import packages
import pandas as pd
import tableone
# create sample dataframe
x = ([0.0] * 41639 + 
     [0.2] * 3 +
     [0.25] * 1 +
     [1] * 3 +
     [10] * 806 +
     [100] * 816 +
     [1000] * 1488 +
     [10000] * 57 +
     [100000] * 3 +
     [11000] * 2 +
     [117000] * 7 +
     [12] * 1 +
     [1200] * 267 +
     [12000] * 51)

data = pd.DataFrame(x, columns=["x"])

2. Create summary table, allowing tableone to guess the data type

Based on the large number of observations and the limited number of unique values, tableone (incorrectly!) guesses that x is categorical

t1 = tableone.tableone(data)
print(t1.tabulate(tablefmt = "github"))
Missing Overall
n 45144
x, n (%) 0.0 0 41639 (92.2)
0.2 3 (0.0)
0.25 1 (0.0)
1.0 3 (0.0)
10.0 806 (1.8)
100.0 816 (1.8)
1000.0 1488 (3.3)
10000.0 57 (0.1)
100000.0 3 (0.0)
11000.0 2 (0.0)
117000.0 7 (0.0)
12.0 1 (0.0)
1200.0 267 (0.6)
12000.0 51 (0.1)

3. Create summary table with the categorical argument

t2 = tableone.tableone(data, categorical=[])
print(t2.tabulate(tablefmt = "github"))
Missing Overall
n 45144
x, mean (SD) 0 93.5 (1764.8)