sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.28k stars 300 forks source link

Fix invalid metadata multi table demo datasets #1297

Open npatki opened 1 year ago

npatki commented 1 year ago

Environment Details

Error Description

A few of multi table demo datasets cannot be modeled using the HMASynthesizer because the metadata is invalid:

For some, there are relationships between 2 primary keys (invalid) instead of a connection between a primary key and a foreign key. Usually, we can fix this by making one of the primary keys a foreign key instead.

Steps to reproduce

Observe that not all the metadata objects pass validation:

from sdv.datasets.demo import get_available_demos

demos = get_available_demos(modality='multi_table')
all_datasets = list(demos['dataset_name'])

for dataset in all_datasets:
  data, metadata = download_demo(
    modality='multi_table',
    dataset_name=dataset
  )

  try:
    metadata.validate()
  except Exception as e:
    print(dataset, e, '\n')

After fixing the issue all the metadata should be valid, which means the code should print nothing out.

npatki commented 7 months ago

Attaching a file below with the output of this script.

output.txt