sdv-dev / CTGAN

Conditional GAN for generating synthetic tabular data.
Other
1.23k stars 279 forks source link

Replace integration test that uses the iris demo data #352

Closed npatki closed 4 months ago

npatki commented 5 months ago

Problem Description

The test_tvae integration test currently uses the iris demo dataset from scikit learn.

https://github.com/sdv-dev/CTGAN/blob/37cc86cdf72d844b8ced1823cadb3ebbee2b623d/tests/integration/synthesizer/test_tvae.py#L22

However, as of the latest merge, scikit-learn is no longer a dependency of CTGAN. So we shouldn't be using it for any testing. It is better to replace this dataset with a different demo (or potentially a hard-coded one).

Additional context

If we don't require skicit-learn anymore, then why is this integration test still passing? That's because CTGAN requires RDT to run, and RDT requires scikit-learn.

In any case, we'd like to clean up dependencies within each library. Since the functionality of CTGAN doesn't directly need scikit-learn, we should not have it referenced in this repo.