Gaussian KDE slower - Githubissues

Environment details

SDV version: sdv 0.17.1
Python version: 3.8.13
Operating System: Mac

Problem description

I'm looking to sample a 1D distribution using the gaussian_kde option of the parameter field_distributions of GaussianCopula(). real_data is a pd.Dataframe() with only 1 column named 'Data'.

When I run

gc_synthetizer = GaussianCopula(field_distributions={'Data':'gaussian_kde'})
gc_synthetizer.fit(np.round(real_data,14))
synthetic_data = gc_synthetizer.sample(len(real_data))

It works, but it's exponentially longer than GaussianCopula() with default parameters. I tried different numbers of samples for the real_data and it's 50 to 200 times longer with gaussian_kde. I also tried the gaussian_kde() of Scipy, and It's much faster to fit and sample from it. It's roughly the same time or a bit longer than GaussianCopula() with default parameters.

sdv-dev / SDV

Gaussian KDE slower #1103

Environment details

Problem description