ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.56k stars 1.69k forks source link

`infer_dtypes` doesn't appear to work #935

Open mbkupfer opened 2 years ago

mbkupfer commented 2 years ago

Describe the bug

My use case is when tables have an ID column that are intergers, but should really just be treated as a categorical variable. When setting infer_dtypes for the profile report though, it still ends up treating them as a number.

*To Reproduce

>>> from string import ascii_lowercase
>>> import pandas as pd
>>> import numpy as np
>>> from pandas_profiling import ProfileReport
>>> df = pd.DataFrame(
...     {
...         "letters": np.random.choice(list(ascii_lowercase), size=100),
...         "ids": range(0, 100),
...     },
...     dtype=object,
... )
...
>>> profile = ProfileReport(df, infer_dtypes=False)

Data: n/a

Code: Preferably, use this code format: n/a (see above)

Version information:

Additional context

sbrugman commented 2 years ago

Will look into this

mbkupfer commented 2 years ago

@sbrugman just wanted to reach out and see where you were on this? Have you figured out the bug? Let me know if you need any help, clarity, or want to discuss.

Ayoyinka-Sofuwa commented 2 years ago

Just tested the code above, looks likeinfer_dtypes for the profile report still treats the id column as an integer in the report.

Is there a way to manually change the data type? Like the astype() function in pandas?