SDMetrics version: Dev Branch (for upcoming 0.11.0)
Problem description
Some datasets may not have many statistical columns, so there may not be enough data for certain types of evaluation.
_Statistical columns are the modeled ones: categorical, boolean, datetime, numerical._
For example, consider a multi-table dataset where I only have 1 numerical column in a table (the others are all ID or PII types). If I try to run the quality report, there is nothing to compute for Column Pair Trends -- as this property requires 2 or more statistical columns for computation.
Observed
The report is correctly identifying that there is nothing to be computed. As expected, the details are blank.
report.get_details('Column Pair Trends')
However, the visualization is also blank because there was nothing computed.
Expected
It is a bit odd to see this type of visualization. Perhaps we could render a single blank graph with a text overlay that says something like:
"No data to display. This property requires at least 2 or more statistical columns within a single table."
Additional Context
It may be worth auditing and revisiting other cases where this can trigger as well. For example:
Quality report, column pair trends property: If there are only categorical columns, then there is nothing to display in the heatmaps
Quality report, column shapes property: If there are no statistical columns
Diagnostic report, all properties: if there are no statistical columns
Diagnostic report, boundaries property: If there are only categorical columns, this is not computed
Environment details
Problem description
Some datasets may not have many statistical columns, so there may not be enough data for certain types of evaluation. _Statistical columns are the modeled ones: categorical, boolean, datetime, numerical._
For example, consider a multi-table dataset where I only have 1 numerical column in a table (the others are all ID or PII types). If I try to run the quality report, there is nothing to compute for
Column Pair Trends
-- as this property requires 2 or more statistical columns for computation.Observed
The report is correctly identifying that there is nothing to be computed. As expected, the details are blank.
However, the visualization is also blank because there was nothing computed.
Expected
It is a bit odd to see this type of visualization. Perhaps we could render a single blank graph with a text overlay that says something like:
"No data to display. This property requires at least 2 or more statistical columns within a single table."
Additional Context
It may be worth auditing and revisiting other cases where this can trigger as well. For example: