tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
161 stars 38 forks source link

Duplicate index values in dataframe gives unexpected results #101

Closed rutgervandeleur closed 4 years ago

rutgervandeleur commented 4 years ago

There seems to be an issue that when two dataframes are concatenated (for example one with cases and one with controls) the continuous variables' means will be calculated incorrectly. This only happens when the results are grouped (by case/control for example). It seems to be due to the fact that there are duplicate indices in concatenated dataframe, which are then used for mean calculation in both groups.

Example:

d_control = pd.DataFrame(data = {'group': [0, 0, 0, 0, 0, 0, 0], 'value': [3, 4, 4, 4, 4, 4, 5]})
d_case = pd.DataFrame(data = {'group': [1, 1, 1], 'value': [1, 2, 3]})
d = pd.concat([d_case, d_control])

# Calculate mean per group (gives the correct values)
d.groupby('group').mean()

image

# Using tableone (gives the wrong values due to duplicate indices)
table = TableOne(d, ['value'], groupby = 'group', pval=True)
print(table.tabulate(tablefmt="github"))
Missing 0 1 P-Value Test
n 7 3
value 0 3.4 (1.2) 2.8 (1.2) 0.059 Two Sample T-test
tompollard commented 4 years ago

thanks @rutgervandeleur, we'll take a look into this.

tompollard commented 4 years ago

@rutgervandeleur many thanks for picking this up. As you say, in some cases duplicate values in the index of input datasets will result in errors.

In version 0.7.7 (now on PyPi at https://pypi.org/project/tableone/0.7.7/), we check for duplicate values in the index and will raise an exception if they are found.

In your example, the fix is now to ignore the indexes when concatenating the datasets:

d = pd.concat([d_case, d_control], ignore_index=True)

...or to reset the index after the dataset has been created:

d = pd.concat([d_case, d_control])
d = d.reset_index(drop=True) 

Thanks again for raising the issue!

rutgervandeleur commented 4 years ago

Thank you for fixing this so quickly, it works as expected now!

tompollard commented 4 years ago

Thanks @rutgervandeleur, glad to hear this helped. Please do raise new issues for other bugs, annoyances, ideas for improvement etc, and we will try to address them.