Open npatki opened 5 months ago
Another use case: the visualization phase after a Quality Report is generated.
If a table has a large number of columns, the generated visualizations become hard to interact with and use for insight gathering. This is an example from the loan_applications dataset:
If I want to focus on ~10 columns in the Quality Report, not an easy way to do this natively. Potential solutions here could either manifest as:
Problem Description
As described in #546, I may want to ignore certain columns in a dataset when running a report (quality or diagnostic). It is not completely intuitive how to do this.
Actual Solution: If you mark a column with an "other" sdtype (not categorical, numerical, datetime, etc.), then SDV will assume it is non-statistical pii and therefore ignore the column. For example, using sdtype
'text'
is sufficient to get a report to ignore the column.Expected behavior
The metadata spec should probably remain as-is, because in the future we may decide to add metrics for specific sdtypes.
However, perhaps the report itself should allow you to specify which columns to ignore?