sdv-dev / SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.
https://docs.sdv.dev/sdmetrics
MIT License
210 stars 45 forks source link

Is there a better to way to render a visualization when no data exists? #402

Open npatki opened 1 year ago

npatki commented 1 year ago

Environment details

Problem description

Some datasets may not have many statistical columns, so there may not be enough data for certain types of evaluation. _Statistical columns are the modeled ones: categorical, boolean, datetime, numerical._

For example, consider a multi-table dataset where I only have 1 numerical column in a table (the others are all ID or PII types). If I try to run the quality report, there is nothing to compute for Column Pair Trends -- as this property requires 2 or more statistical columns for computation.

Observed

The report is correctly identifying that there is nothing to be computed. As expected, the details are blank.

report.get_details('Column Pair Trends')
image

However, the visualization is also blank because there was nothing computed.

image

Expected

It is a bit odd to see this type of visualization. Perhaps we could render a single blank graph with a text overlay that says something like:

"No data to display. This property requires at least 2 or more statistical columns within a single table."

Additional Context

It may be worth auditing and revisiting other cases where this can trigger as well. For example: