Closed uros-r closed 2 weeks ago
Hi there @uros-r to help me provide the best guidance, do you mind sharing more about your use case for controlling randomization?
Every time you sample from the same GaussianCopulaSynthesizer, you'll get new, random synthetic data. If you run the following code after your code example, s1
and s2
will have different values as you probably are already aware!
synthesizer.fit(data=real_data)
s1 = synthesizer.sample(num_rows=500)
s2 = synthesizer.sample(num_rows=500)
You can reset the randomization state using synthesizer.reset_sampling() to the same state when the synthesizer was fit. If you run the following code, s1
and s2
will be the same data.
synthesizer.fit(data=real_data)
s1 = synthesizer.sample(num_rows=500)
synthesizer.reset_sampling()
s2 = synthesizer.sample(num_rows=500)
You get a decent amount of control using these 2 methods, but again having more context into your use case would help!
Hi there @uros-r just following up :)
Hey @srinify - thanks for the suggestions, much appreciated.
Our use case involves creation of a web based tool to allow users to interactively generate one or more anonymised dataset versions from a given source dataset.
For this, we came up with the workaround of using server-side session state to create and reuse synthesizer objects. This allows multiple calls to .sample()
that produce different results, as you suggested.
Problem description
Hi - I'm looking for a way to seed the generation of synthetic data to produce different samples repeatedly.
Looks like this used to be supported in past versions of the library, and may still be, but I can't get it to work with the current version. May well be missing something obvious.
What I already tried
Using the getting started example + SDV 1.15 :
I've looked at the docs, tried setting the global np.random state / seed and torch seed (as recommended in a now-dated issue).
Also tried setting
FIXED_RNG_SEED
inbase.py
to a different value.In all cases,
synthetic_data
remains identical.Appreciate any help.