Closed tompollard closed 5 months ago
We found this error in https://github.com/MIT-LCP/hack-aotearoa/blob/master/03_summary_statistics.ipynb. Looking at it a little more closely, the following chunk is fine:
import pandas as pd
from tableone import TableOne, load_dataset
d = {'col1': [1, 2, 4, 5],
'col2': [3, 4, 5, 6],
'outcome': [0,1,1,0]}
df = pd.DataFrame(data=d)
df.dtypes
# col1 int64
# col2 int64
# outcome int64
# dtype: object
TableOne(df, columns=['col1', 'col2'], groupby='outcome')
# works fine
Converting one of the int64 to Int64 raises the error:
df2 = df.astype({"col1": "Int64"})
TableOne(df2, columns=['col1', 'col2'], groupby='outcome')
@ngphubinh
The error is raised for pandas==1.4.3
. When running pandas>=2.0.0
, the error is not raised and everything seems to work fine.
Bumping pandas fixed the issue: https://github.com/tompollard/tableone/pull/165
When run in a Colab notebook (specifically https://github.com/MIT-LCP/hack-aotearoa/blob/main/03_summary_statistics.ipynb), the following chunk raises a data type error (related to Int64 formatted values).