ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.39k stars 1.67k forks source link

Feature Request: Configure which Alerts to be rejected #1642

Open daso94msg opened 4 weeks ago

daso94msg commented 4 weeks ago

Missing functionality

Right now it is possible to switch on/off the rejection of variables entirely with the configuration-variable _rejectvariables.

Proposed feature

I would like to be able to only reject some type of variable. For example reject missing variables, but still show constant variables normally in the report.

Alternatives considered

I'm not sure about a workaround, to me it seems like it depends on the Typeset as they all come back to the unsupported Basetype-

Additional context

No response

fabclmnt commented 3 weeks ago

Hi @daso94msg ,

thank you for your feature request. Nevertheless it is not totally clear what is the end goal and utility of the request.

Can you please provide a practical example?

daso94msg commented 3 weeks ago

We use the reports to have a first glance at unknown data sets. A column that consists only of NA-values is indeed not that interesting. For a constant column on the other hand, it could still be interessting to check if that constant value is plausible. But eventually they both get grayed out in the report as rejected, which tempts you to not look into them at all.

If it's not a feature worth pursuing for you, could you give me an idea where I have to look into it? Where does the rejection happen? I already looked in the _./ydataprofiling/model/alerts.py and _./model/pandas/summarypandas.py where I found out it depends on the description["type"] of a series_description being "Unsupported" or not.