tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
161 stars 38 forks source link

Continuous variable not appearing correctly #82

Closed adhaimovich closed 4 years ago

adhaimovich commented 5 years ago

Hi team - I've been through the docs and the relevant questions here. I am having trouble getting ages (data frame column type float) to show up as mean/std or IQRs. I just get the counts per year.

table = TableOne(df, columns=['age','gender'])

Would appreciate any support!

Thanks, Adrian

tompollard commented 5 years ago

@adhaimovich, if you aren't already running the latest version of tableone, please upgrade first. The current version is 0.6.0.

>> import tableone  
>> print(tableone.__version__) 
# 0.6.0

Here's a simple example of a dataframe with age (float) and gender (string):

>> import pandas as pd
>> from tableone import TableOne

>> df = pd.DataFrame({'age': [20.1, 30.2, 40.3, 50.4],
                      'gender': ['M', 'F', 'M', 'U']})
>> print(df)

#     age gender
# 0  20.1      M
# 1  30.2      F
# 2  40.3      M
# 3  50.4      U

>> table = TableOne(df, columns=['age', 'gender'], label_suffix=True)   
>> print(table) 

#                      isnull      overall
# variable       level                    
# n                                      4
# age, mean (SD)            0  35.2 (13.0)
# gender, n (%)  F          0     1 (25.0)
#                M                2 (50.0)
#                U                1 (25.0)

One possibility is that your ages are not floats, but strings. e.g.:

>> df2 = pd.DataFrame({'age': ['20.1', '30.2', '40.3', '50.4'],
                       'gender': ['M', 'F', 'M', 'U']})

>> table2 = TableOne(df2, columns=['age', 'gender'], label_suffix=True)   
>> print(table2) 

#                    isnull   overall
# variable      level                 
# n                                  4
# age, n (%)    20.1       0  1 (25.0)
#               30.2          1 (25.0)
#               40.3          1 (25.0)
#               50.4          1 (25.0)
# gender, n (%) F          0  1 (25.0)
#               M             2 (50.0)
#               U             1 (25.0)

You could fix this by (1) changing the data type before generating the table, which would be best or (2) explicitly naming the categorical variables using the categorical argument.

>> table3 = TableOne(df2, columns=['age', 'gender'], categorical=['gender'], 
                     label_suffix=True)   
>> print(table3) 

#                      isnull      overall
# variable       level                    
# n                                      4
# age, mean (SD)            0  35.2 (13.0)
# gender, n (%)  F          0     1 (25.0)
#                M                2 (50.0)
#                U                1 (25.0)

Hope this helps! If not, please provide a sample dataframe to help us to reproduce the issue.

adhaimovich commented 5 years ago

Thank you, this very helpful response solved my problem!

From: Tom Pollard notifications@github.com Reply-To: tompollard/tableone reply@reply.github.com Date: Wednesday, May 22, 2019 at 11:57 PM To: tompollard/tableone tableone@noreply.github.com Cc: "Haimovich, Adrian" adrian.haimovich@yale.edu, Mention mention@noreply.github.com Subject: Re: [tompollard/tableone] Continuous variable not appearing correctly (#82)

@adhaimovichhttps://github.com/adhaimovich, if you aren't already running the latest version of tableone, please upgrade first. The current version is 0.6.0.

import tableone

print(tableone.version)

0.6.0

Here's a simple example of a dataframe with age (float) and gender (string):

import pandas as pd

from tableone import TableOne

df = pd.DataFrame({'age': [20.1, 30.2, 40.3, 50.4],

                  'gender': ['M', 'F', 'M', 'U']})

print(df)

age gender

0 20.1 M

1 30.2 F

2 40.3 M

3 50.4 U

table = TableOne(df, columns=['age', 'gender'], label_suffix=True)

print(table)

isnull overall

variable level

n 4

age, mean (SD) 0 35.2 (13.0)

gender, n (%) F 0 1 (25.0)

M 2 (50.0)

U 1 (25.0)

One possibility is that your ages are not floats, but strings. e.g.:

df2 = pd.DataFrame({'age': ['20.1', '30.2', '40.3', '50.4'],

                   'gender': ['M', 'F', 'M', 'U']})

table2 = TableOne(df2, columns=['age', 'gender'], label_suffix=True)

print(table2)

isnull overall

variable level

n 4

age, n (%) 20.1 0 1 (25.0)

30.2 1 (25.0)

40.3 1 (25.0)

50.4 1 (25.0)

gender, n (%) F 0 1 (25.0)

M 2 (50.0)

U 1 (25.0)

You could fix this by (1) changing the data type before generating the table, which would be best or (2) explicitly naming the categorical variables using the categorical argument.

table3 = TableOne(df2, columns=['age', 'gender'], categorical=['gender'],

                 label_suffix=True)

print(table3)

isnull overall

variable level

n 4

age, mean (SD) 0 35.2 (13.0)

gender, n (%) F 0 1 (25.0)

M 2 (50.0)

U 1 (25.0)

Hope this helps! If not, please provide a sample dataframe to help us to reproduce the issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/tompollard/tableone/issues/82?email_source=notifications&email_token=ACKHBZTNS2FMPSWACSTZVM3PWYI4DA5CNFSM4HOYCMLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWA74EQ#issuecomment-495058450, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACKHBZUU5ICKZAVVGNIJDETPWYI4DANCNFSM4HOYCMLA.