sdv-dev / SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.
https://docs.sdv.dev/sdmetrics
MIT License
201 stars 45 forks source link

Allow me to visualize just the real or synthetic data #581

Closed npatki closed 2 months ago

npatki commented 3 months ago

Problem Description

All of the SDMetrics visualization utilities require that I put in both real and synthetic data in order to see the visualization. However, in some cases, I may be in possession of only the real data (or only the synthetic). In these instances, it would still be useful to use the utilities.

Expected behavior

The API for the utilities can remain as-is. However, we can accept None as a valid value for either real_data or synthetic_data. The logic can change as follows:

Error: No data provided to plot. Please provide either real or synthetic data.

This should apply to all utility visualizations: get_column_plot, get_column_pair_plot, and get_cardinality_plot

Additional context

Until we add this feature, here is a quick-and-dirty workaround: Supply the same dataset with both the real_data and synthetic_data parameters. Since the rendered plot is interactive, you can then click the legend to "hide" one of the options.

from sdmetrics.visualization import get_column_pair_plot

fig = get_column_pair_plot(
    real_data=real_data,
    synthetic_data=real_data,
    column_names=['column_a', 'column_b'],
    plot_type='scatter'

)

fig.show()