The SDMetrics library comes with built-in multi-table demo data that you can use to explore the reports. It includes real data, synthetic data, and the metadata as hardcoded in this folder
The problem is that the synthetic data was created a long time ago using very old versions of the SDV. Since the older versions had many bugs, the synthetic data doesn't quite match the real data for a lot of important qualities. In particular, the BoundaryAdherence is unmet for transactions.amount, users.age and transactions.timestamp because at the time, SDV was not adhering to min/max values.
Expected behavior
Update the synthetic data available for the multi-table demo. We can do this by:
Keeping the same metadata and real data
Running the real data through the HSASynthesizer
Sampling new synthetic data and saving the new synthetic data instead
Additional context
Upon doing this, the new version of the Diagnostic Report should have a score of 1.0 (i.e. the BoundaryAdherence should be met).
Problem Description
The SDMetrics library comes with built-in multi-table demo data that you can use to explore the reports. It includes real data, synthetic data, and the metadata as hardcoded in this folder
The problem is that the synthetic data was created a long time ago using very old versions of the SDV. Since the older versions had many bugs, the synthetic data doesn't quite match the real data for a lot of important qualities. In particular, the BoundaryAdherence is unmet for
transactions.amount
,users.age
andtransactions.timestamp
because at the time, SDV was not adhering to min/max values.Expected behavior
Update the synthetic data available for the multi-table demo. We can do this by:
Additional context
Upon doing this, the new version of the Diagnostic Report should have a score of 1.0 (i.e. the BoundaryAdherence should be met).