sfu-db / dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
http://dataprep.ai
MIT License
1.97k stars 201 forks source link

Need data-type for each column in create_report function. #953

Open anthng opened 1 year ago

anthng commented 1 year ago

Hi all, Currently, I need to add a data-type (type) param in creat_report() like as plot() function. This data type can help me generate report with numerical/categorical features without affecting "Distinct Count".

This image below was automatically generated by creat_report. However, my expected output is numerical stats and visualization. image

My expected feature:

dttype = {c: "Continuous" for c in dataframe.columns}
creat_report(dataframe, dtype=dttype)

Any solution to my problem, please support me. Thanks

dovahcrow commented 1 year ago

I see. So it seems DataPrep automatically identified your columns as categorical. May I ask what is the output of dataframe.dtypes?

anthng commented 1 year ago

I see. So it seems DataPrep automatically identified your columns as categorical. May I ask what is the output of dataframe.dtypes?

I cast all dtype of dataframe.dtypes to float before creating report. In e.g above, I attempted to cast "Continous", but it does not work I guess that DataPrep automatically identifies a feature is numerical or categorical based on "distinct count" and "data type". I am not sure about this.