sdv-dev / SDGym

Benchmarking synthetic data generation methods.
Other
256 stars 58 forks source link

Add ability to load and inspect individual datasets #261

Open npatki opened 10 months ago

npatki commented 10 months ago

Problem Description

The SDGym library currently allows you to list the available datasets for benchmarking purposes. However, it does not offer any abilities to inspect these datasets -- users may want to do this in order to see what the columns, data types, or values look like before they apply them to the benchmarking run.

Expected behavior

Add a download_demo method that is similar to the one in the SDV library. This method would return the data and metadata so that SDGym users can inspect the dataset.

Workaround

The SDV library is a prerequisite of SDGym. So as a workaround, you can access the demo datasets through it.

import sdv

from sdv.datasets.demo import download_demo

data, metadata = download_demo(
    modality='single_table',
    dataset_name='adult'
)
npatki commented 10 months ago

For a related discussion, see #253