gretelai / gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.
https://gretel.ai/platform/synthetics
Other
579 stars 87 forks source link

ValueError: multiprocessing_context option should specify a valid start method in ['spawn'], but got multiprocessing_context='fork'[FR / BUG] #156

Closed jordycollingwood closed 1 year ago

jordycollingwood commented 1 year ago

Hi there,

When running the following sample code it throws a multiprocessing error:

ValueError: multiprocessing_context option should specify a valid start method in ['spawn'], but got multiprocessing_context='fork'


# Create some random training data data
df = pd.DataFrame(np.random.random(size=(1000,30)))
df.columns = pd.date_range("2022-01-01", periods=30)
# Include an attribute column
df["attribute"] = np.random.randint(0, 3, size=1000)

# Train the model
model = DGAN(DGANConfig(
    max_sequence_len=30,
    sample_len=3,
    batch_size=1000,
    epochs=10,  # For real data sets, 100-1000 epochs is typical
))

model.train_dataframe(
    df,
    attribute_columns=["attribute"],
    discrete_columns=["attribute"],
)

# Generate synthetic data
synthetic_df = model.generate_dataframe(100)

synthetic_df

I am using a windows machine and jupyter notebook but have tested in terminal and same error.

Found this stackoverflow which suggests it is a windows issue? https://stackoverflow.com/questions/76076183/how-do-i-set-multiprocessing-context-to-spawn-in-my-code

Is there any possibility the multiprocessing_context in the dgan dataloader could be modifiable?

Thanks for future help with this.

Jordan

Marjan-emd commented 1 year ago

Hi @jordycollingwood, we do not support source available license (SAL) code on Windows at the moment.