sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.21k stars 287 forks source link

Add reproducibility when fitting a synthesizer #2022

Open srinify opened 1 month ago

srinify commented 1 month ago

Problem Description

I want to improve my ability to evaluate synthesizers with different parameters, in different environments, and against each other.

Expected behavior

As a user, I'd like the Synthesizer models to be fit in the same way so I can generate the same synthetic data every time.

Potential API

There are situations when you want a slightly different model to be trained. So reproducibility may be something we try to incorporate with a parameter:

synthesizer.fit(original_data, random_state=1)

Additional context

Originally raised here: https://github.com/sdv-dev/CTGAN/issues/380#issuecomment-2109042846