gretelai / gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.
https://gretel.ai/platform/synthetics
Other
596 stars 87 forks source link

DGAN for ECG dataset #162

Closed sanketahegde closed 1 month ago

sanketahegde commented 1 year ago

Hello,

I have been trying to apply the DoppleGANger (DGAN) model on my 1-lead ECG dataset to generate synthetic data but after some tries and tuning some basic hyperparameters, the model does not learn the pattern of the ECG. So, I just wanted to make sure if DGAN is even applicable for a Biosignal or ECG data generation. Any suggestions are welcome!

Thank you in advance.

kboyd commented 1 year ago

Hi @sanketahegde, thanks for trying out DGAN and asking questions! In general, DGAN is quite good for biosignals when sufficient training data is available, but I know ECG data has very specific properties that need to be preserved. To get the most out of DGAN, I'd recommend thinking about the following items:

  1. What is an example? That is, how long are the sequences that DGAN independently generates (max_sequence_len parameter)? Generally speaking, shorter sequences are easier. I'd try sequences that only contain 2-3 heartbeats (maybe even just 1) to start, and then expand to longer sequences once you have the shorter sequences working.

  2. How much data do you have? DGAN really excels when there's lots of training examples, 10k or more, maybe even target 100k+ to learn the intricacies of ECGs. So if you use 2 seconds of the ECG sampling as the example length, that's 100k 2-second snippets. With time series, you can do sliding windows to increase training examples if you're splitting up longer sequences. But that may also make the model learning task a bit harder if each training sequences starts at very different points in the ECG period. Definitely experiment with different ways to construct the training data.

  3. Hyperparameters are absolutely key. It's great you've explored some hyperparameter tuning. I've found the most impactful parameters to explore are learning rates and epochs. Besides finding the right order of magnitude, DGAN can be fairly sensitive to even 30% changes in these values, so doing a thorough exploration with grid search, or using a library like optuna can be really powerful. And of course having a good metric to optimize for is critical. There's not really a loss that can be used for early stopping with GANs, so utilizing metrics related to ECGs would be best.

Hope that provides some experiment ideas. And if you you're willing to share a notebook or code snippet of how you're setting up the training data and the model, I'm happy to take a look to see if there are any more specific recommendations.

sanketahegde commented 1 year ago

Hi @kboyd ,

Thank you very much for your detailed reply with suggestions. As my work with DGAN is on hold, I shall try to apply your suggestions and update here if I get some better results.

Manuelhrokr commented 1 year ago

This is an interesting discussion.

I have been running some basic experiments on my TS data using DGAN. My main goal is to create synthetic time series while keeping (as best possible) the fidelity and flexibility properties of my data (i.e., as stated by the original authors of the method in their paper). However, there's no free lunch, and for my particular case, having ~ 2k to 2.5k data samples of max_sequence_len = 24 is the best I can do, due to the hourly resolution of my data. Hence, following recommendations from @kboyd, I mostly rely on (3) to enhance, as much as possible, the fidelity and flexibility of the synthetic samples.

Finally, does DGAN implementation allow to use a seed S to generate N number of samples each time with a different seed? That is, assuming I have 2k new 24-hour synthetic TS samples, I would like to use a new seed S to generate a new set of 2k synthetic samples, ... , and so on. I assume a new run of DGAN would approximate this behavior, right?

Comments/feedback on these questions would be appreciated.

Thanks!

mckornfield commented 1 month ago

Closing as stale, I left comments in the other issue but feel free to reopen