gretelai / gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.
https://gretel.ai/platform/synthetics
Other
589 stars 87 forks source link

About DoppelGANger training results #146

Closed YYM0093 closed 1 year ago

YYM0093 commented 1 year ago

Hello, I am very lucky to get inspiration from Gretel. I have encountered some questions. I hope you can help me answer them. As we all know, the DoppelGANger model in Gretel is used to generate time series models. Let me introduce my question by analogy. I have a time series data set. The "Date" column ranges from January 1, 2000 to December 31, 2000. The data is recorded daily. According to the input format required by DoppelGANger, 3D data (, , man_sequence_len) needs to be segmented according to "max_sequence_len" After model training, the generated data is also three-dimensional (, , man_sequencelen), such as max sequence_ Len=5, the "Date" column of all the data I generated will be repeated between January 1, 2000 and January 5, 2005, and have an "example_id" column. My question is as follows:

  1. Does the data in the same "example_id" column represent the trend of the entire original data? If not, what does he represent?
  2. Can I restore the "Date" column of the generated data to the "Date" column of the original data (a complete time series)?
santhosh97 commented 1 year ago

Hi @YYM0093! Do you mind providing a snippet of how your dataframe looks? That might give me a better understanding of what you are facing. The example id is primarily used to separate the different sequences in your data which in consequence, the DGAN will train on. The colume you give is used to split “long” format data frame into multiple examples, if None, data is treated as a single example. At the moment, you cannot restore the "Date" column; however, a workaround could be to set attributes for different months, seasons, etc.