Closed FrancisNji closed 1 year ago
Thanks for the question @Frankie0609! I'm guessing this question is for the timeseries_dgan model. Let me know if that's not the case.
In our DGAN model, sample_len
controls some internals of how we model a time series. The max_sequence_len
parameter is how many time points are in each of your example time series. sample_len
needs to divide max_sequence_len
evenly, and is used to implicitly split the sequence into smaller chunks for the model to work with. Specifically, DGAN uses an RNN architecture and sample_len
is how many time points are generated from each cell of the RNN.
We recommend using sample_len=1
for shorter time sequences, say up to ~20 (max_sequence_len=20
). In longer sequences, being able to experiment with different values for sample_len
allows you to explore the tradeoffs between a larger model that probably requires more data to train(small sample_len
) and a smaller model with faster per epoch training (larger sample_len
). It can also be very useful if you know there's periodicity in your data, e.g., use sample_len=7
for daily data with weekly patterns, though this is not required.
There's a few places to learn more about this model. For this particular implementation, see our blog posts https://gretel.ai/blog/create-synthetic-time-series-with-doppelganger-and-pytorch and https://gretel.ai/blog/generate-time-series-data-with-gretels-new-dgan-model. And our PyTorch implementation is based on the DoppelGANger model published in https://arxiv.org/abs/1909.13403. This paper has some discussion about including sample_len
as a configurable parameter for the model.
Hope that information helps! Let me know if you have any other questions.
Much thanks for this clarification
Are you reporting a bug or FR? No.
What version of synthetics are you using?