tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
163 stars 38 forks source link

tableone.TableOne with categorical pandas DataFrame column raises TypeError #177

Closed eroell closed 3 months ago

eroell commented 3 months ago

Hi,

with the new release 0.9.0 I run into the following issue when having a categorical column type in a pandas DataFrame:

import tableone
import pandas as pd

dummy_table = pd.DataFrame(
    {
        "age": [70, 80, 90, 85],
        "sex": ["m", "f", "m", "f"]
    }
)
dummy_table["sex"] = dummy_table["sex"].astype("category")

tableone.TableOne(dummy_table)

raises

TypeError: Cannot setitem on a Categorical with a new category (None), set the categories first

The same example works just fine when omitting dummy_table["sex"] = dummy_table["sex"].astype("category"), that is when the column type is "object".

On Python 3.11.9, Info on Environment (pip list):

Package         Version
--------------- -----------
et-xmlfile      1.1.0
Jinja2          3.1.4
MarkupSafe      2.1.5
numpy           1.26.4
openpyxl        3.1.4
packaging       24.1
pandas          2.2.2
patsy           0.5.6
pip             24.0
python-dateutil 2.9.0.post0
pytz            2024.1
scipy           1.13.1
setuptools      65.5.0
six             1.16.0
statsmodels     0.14.2
tableone        0.9.0
tabulate        0.9.0
tzdata          2024.1

Did not yet dive into why this is the case... for tableone 0.8.0 this works. I have pandas 2.2.2 for both the working tableone 0.8.0 and the non-working tableone 0.9.0 setup.

Is this a bug or has this been made stricter input rule for a reason?

Best,

tompollard commented 3 months ago

Apologies, thanks for flagging this!

Is this a bug or has this been made stricter input rule for a reason?

It's a bug, I think caused by the introduction of the include_null argument in: https://github.com/tompollard/tableone/pull/175

I'll fix it today or tomorrow, but for the meantime you may find that setting include_null=False resolves the issue. This will switch back to the old behaviour.

tompollard commented 3 months ago

Thanks again @eroell. Should be fixed if you bump the version to 0.9.1: https://pypi.org/project/tableone/0.9.1/

eroell commented 3 months ago

Thanks a lot for the fast resolve @tompollard! Confirm that bumping to 0.9.1 resolved this issue.

tompollard commented 3 months ago

Thanks! Feel free to raise issues if there are other bug fixes or features that you'd like to see.