In order to model multi-table data, SDV expects that all references between a foreign and primary key can be found. In other words, there is referential integrity within the dataset -- and no orphan children to be found.
Of all the demo datasets, Carcinogenesis_v1 and Toxicology_v1 do not have referential integrity and so cannot be modeled by any of the multi-table synthesizers.
Detailed Output
The metadata itself is valid, but the referential integrity is broken. Below is the output of calling metadata.validate_data(data) on these datasets.
We used the drop_unknown_references feature to remove any unknown foreign key values. Both datasets now have referential integrity and work with the SDV synthesizers.
Problem Description
In order to model multi-table data, SDV expects that all references between a foreign and primary key can be found. In other words, there is referential integrity within the dataset -- and no orphan children to be found.
Of all the demo datasets,
Carcinogenesis_v1
andToxicology_v1
do not have referential integrity and so cannot be modeled by any of the multi-table synthesizers.Detailed Output
The metadata itself is valid, but the referential integrity is broken. Below is the output of calling
metadata.validate_data(data)
on these datasets.output.txt
Fix
TBD. We can either remove these datasets from the demo altogether, or find a subsample of rows that do maintain referential integrity.