Open mayang113 opened 2 months ago
That's normal to have difference between the generated samples, that is how the model capture the variability. GANO is a generative model, which is stochastic. Basically, you cannot make one-to-one comparison, but compare the calculated statistics from the generated samples.
Basically, you cannot make one-to-one comparison, but compare the calculated statistics from the generated samples.
Let me explain my point into detail:
Different from deterministic numerical simulation methods, GANO is stochastic and take the sample from a Gaussian process as input. You cannot make one-to-one comparison for stochastic method. That's why we group events for comparing the statistics : (Figure 9 in our paper). One example of one-to-one comparison in the supplementary materials (Figure S3) is used to give you a general sense about the performance of our model, but not rigorous in statics. Quantitatively comparison should involve engineering metrics like FAS, Rotd50, residual analysis etc for the median and variability derived from the generated samples.
You can freely change the number of generated samples for one condition, 100 is not a fixed number. For calculating the statics, you should also calculating the uncertainty ( standard deviation in logscale). Uncertainty quantification is very important in earthquake engineering (that's why residual analysis is widely used).
I also recommend reading the previous work developed by our group, which may help you have a better understanding of the validation pipeline :https://pubs.geoscienceworld.org/ssa/bssa/article/112/4/1979/613199/Data-Driven-Synthesis-of-Broadband-Earthquake
I would suggest reading the references in the introduction part of our paper, which can be useful for understanding how people validate different waveforms generation methods.
Hello author In your paper,why are there four acceleration time histories for a specific scenario? Shouldn't there be only one?
My understanding :The right panel shows the results of 4 acceleration time courses randomly selected from the 100 time course data generated by GANO through the conditions M 4.5, 50km, 300m/s, and shallow crustal. The left panel is the result of randomly selecting within the neighborhood of the condition M 4.5, 50km, 300m/s, and shallow crustal
Yes, your understanding is correct.
The dataset you collected is always discrete w.r.t the conditional variables, the perfect match for any combination of the conditional variables is almost impossible. Thus, for a specific scenario, you actually define narrow bands for each condition. For example, "Observation M4.5, 50 km, 300m/s" represents M4.4-M.6, 40km-60km, 250-350m/s". If you have a large dataset, you can choose smaller bandwidth for the conditions, which will make the comparison more accurate.
Hello author
It's based on the acceleration, and FAS is normalized. (FAS_normalized = FAS * dt)
Yes, in the shown scenario, we have 76 observations in the bin. Here, for each observation, we provide the associated and identical meta information (mag, vs30, rrup, f_type) to the GANO and ask GANO only generate 1 realization. In this way, the number of synthesized data matches the number of observations
Hello,author
Yes, that's the postprocessing code. If you train the model with acceleration data, the GANO model will generate acceleration time histories
You can modify the code, we added that part because we are not allowed to share our dataset. That function is used to convert the normalized log10_PGA to the actually PGA, which is a built-in function in SeisData class.
Thanks to the author for being able to answer my question because GANO is so advanced that no one around me would have this knowledge
Yes, change the 3 to 2
For conditional variables, I would suggest just keeping magnitude and rupture distance as the conditional variables, since Vs30 is a constant not a variable in your case.
I didn't understand your question well. What do you mean the generated data is a multiple of the recorded data? Besides, we usually use log-log plots for Fourier Amplitude Spectrum
From figure you showed, there exists a constant error between the ground truth mean and predicted mean. You may need to check the postprocessing method.
Hello author, I am once again having problems😭😭😭. I see that when I use the trained model to generate the acceleration timescales, the program is set to generate 100 images because it is set to generate 100 images, but there is a big difference between these 100 images, is this normal? Is it ok to choose the best one from the 100 images as the final generated result? (best fit to the observed data)