question - Githubissues

mayang113 commented 3 months ago

I am a first-year graduate student and have just started learning about GANs. Thank you for your patient answers. I have two more questions.

Can I use data that doesn't start from the initial P-wave for training? (Only applying filtering and baseline correction to the data) My data length is 120,000, and the time interval is 0.000675 seconds. Can I still use your model?

yzshi5 commented 2 months ago

Hi, for your questions:

Q1. Can I use data that doesn't start from the initial P-wave for training? Using waveforms with the start of the P-wave arrival has the following benefits: (1) we treat signal before the P-wave arrivals as noise, remove such noise will make the training dataset better controllable, and improve the performance of GANO. (2) By forcing the waveforms start with P-wave arrivals of the training dataset, we can unify outputs from GANO (which learns from the dataset). To be specific, we can know cGm-GANO generate waveforms with the onset of P arrivals, otherwise you have to determine where the P arrivals are for the generated waveforms. (3) In the training dataset, each record has a fixed total duration (60s in our case). As the duration of the noise increases , the duration of the signal correspondingly decreases. (4) The baseline-correction method should be applied to waveforms with the start of P-wave arrivals. The correct way of using baseline correction is that for each record, only apply the baseline correction to the true waveforms (start from P-wave arrival and end with the stop of motion), after you get the processed waveforms, you can pad or truncate the waveforms from different records to unify their duration.

Q2. My data length is 120,000, and the time interval is 0.000675 seconds. Can I still use your model? You can train the model using the training dataset with arbitrary duration and resolution, I just tested and updated the latest version of the model. However, for your case, the signal record has a length of 120000 will definitely choke any machine. The GPU memory will not enough. The sampling frequency of your case is >1000Hz, that is beyond the engineering interest (0.1-30Hz). To give you an idea of the GPU memory usage, for our case, the length for each record is 6000, sampling frequency is 100Hz, the number of records is around 42k. The training takes 3~4 days using a single Nvidia V100 Graphic Card (32 GB memeory).

yzshi5 commented 2 months ago

The cGm-GANO is built on Generative Adversarial Neural Operator (GANO), not GAN. The neural operator enables learning mapping between function spaces for cGm-GANO. For example, you can train cGM-GANO using one-resolution dataset, let's say 100Hz. With the trained model, you can generate waveforms with arbitrary resolution, (like 50Hz, 150Hz, 200Hz, 400Hz, etc), which is infeasible for GAN.

mayang113 commented 2 months ago

Thank you for your response. My data is the result of numerical simulation, with a frequency range of 0 to 2 Hz. Therefore, the time intervals are particularly small. Do you think this type of data is not suitable for your model?

yzshi5 commented 2 months ago

frequency range from 0 to 2Hz is a much easier question, and you don't even need the preprocessing since you take numerical simulation waveforms as noise-free data, you can refer to the "BBP verification" section of the our paper for more information. The sampling frequency should be 1/time interval, it confuses me when you say data length is 120,000, and the time interval is 0.000675 seconds.

mayang113 commented 2 months ago

In order not to disturb your rest, this is my last question My sampling frequency is indeed >1000Hz, is this not necessary? Can I use your model by changing the sampling frequency so that my data length is shorter (my data time is 81s)? When I use an input length other than 6000, your model doesn't need to be changed anywhere, right?

yzshi5 commented 2 months ago

I would suggest resample the observation data (not the simulated data 0~2Hz) to 100Hz. From my opinion, resample the data to 50Hz, or 200Hz, etc should also be fine, depends on your GPU memory. Basically, to quickly adopt a new model to your dataset, you should try to keep the hyperparameters the same as those described in the paper. After you can some preliminary results, you can confidently change the hyperparameters.

When the input length is not 6000, you don't need to change anything. The framework can take training dataset with arbitrary sampling frequency and duration.

yzshi5 / GM-GANO

question #5