sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.37k stars 316 forks source link

PAR model should be able to reproduce a smooth timeseries #523

Open arsinnius opened 3 years ago

arsinnius commented 3 years ago

Environment details

The code was run on Colab

Problem description

I'm using PAR to model macro variables. First, I modeled the VIX volatility index with no apparent problem. Next, I tried gross domestic product. The original GDP is plotted in the first figure - a smooth curve. The sample is plotted in the second figure. Something is clearly wrong. This raises doubt about the validity of the VIX sample.

image image

The data was downloaded from the FRED website as a csv file and converted to a pandas df. The df has two columns - DATE and GDP.

What I already tried

I tried a second time and got the following: image

Here is the code I ran:

date_cols =['DATE'] 
gdp_df = pd.read_csv('./csv_data/GDP.csv', parse_dates=date_cols)
gdp_df.DATE = pd.to_datetime(gdp_df.DATE)
gdp_df.plot(x="DATE")

sequence_index = 'DATE'
model = PAR(sequence_index=sequence_index)
model.fit(gdp_df)
gdp_syn_1 = model.sample()
gdp_syn_1.plot(x='DATE')
npatki commented 2 years ago

Thanks for the detailed explanation and feedback @arsinnius. I am able to get some improvements when increasing epochs. Unfortunately, it never ends up looking like a smooth curve. Seems like it's difficult to for the neural network to learn.

I'm curious about how you're planning to use the synthetic data? A key strength of the PAR model is being able to model multi-sequence data. Since you only have a single sequence, I'm wondering if it could make sense to model this particular data as tabular and sort the synthetic values by DATE afterwards.

Eg.

from sdv.tabular import GaussianCopula

model = GaussianCopula(default_distribution='beta')
model.fit(gdp_df)
synthetic_data = model.sample(num_rows=289)
synthetic_data = synthetic_data.sort_values(by=['DATE']).reset_index()

This got me more relative accuracy, though the curve still wasn't smooth.

image
npatki commented 2 years ago

Hello, I'm turning this into a feature request and slightly renaming it. We'll keep this open to track as we make progress.