retrain models - Githubissues

chengxinjiang commented 3 years ago

Thanks for making this very useful tool available.

I am trying to re-train the model using STEAD but applying a filter at 1-25 Hz first to mimic my input data. I have a question regarding the two available trained models provided in the repository. According to the documentation, one is the result of minimizing the false positive and the other for minimizing the false-negative rate. Could you please provide a bit more details on how the two different minimizing priorities are reflected in the training parameters? My aim is to recreate the default model on minimizing the false positive. Thanks!

smousavi05 commented 3 years ago

@chengxinjiang those models are maid based on different dataset and amount of augmentations (e.g. adding noise etc). However, if the only reason for you to retrain the network is the slight difference in the frequency band of your data compared with STEAD, I don't think that would be necessary. The trained model should work fine on your data. But if there is another reason like high false negative or positives, I would be happy to know what is unique in your data.

chengxinjiang commented 3 years ago

@smousavi05 thanks for the quick response. The reason that I am thinking about to re-train the model is that the provided default model seems to fail at picking some obvious phases in my data. Since the STEAD contains earthquake data across different tectonic settings, I am not sure how unique my data is. But for your reference, my study region is located in east Indonesia and contains many slab seismicity. What I have done is to use ISC catalog as a benchmark, which contains around 600 events in the time span I was looking at with magnitudes ranging between 2.5-4.5. EQTransformer seems to do a good job at many of them but fails at picking enough picks for 20% of them (with some obvious phases not picked). Here are two snapshots showing the waveform data with predicted P/S phases in red and EQT picked data in blud(P)/red(S). Event1 is from a relatively deep range (180 km) and the other from a shallow depth of 35 km. It would be great to get some insights from you on how this happens and how to improve it.

Cheers, Chengxin

filefolder commented 3 years ago

As a tangent (sorry), I have been experimenting with training models using the entire STEAD dataset with pretty close to the default EqT parameters (batch size 256) and here is the best I can do before the models invariably stop improving, usually after ~12 or so runs regardless of batch size.

last loss: 0.00864727245721 last detector_loss: 0.0253163 last picker_P_loss: 0.00655299 last picker_S_loss: 0.00818051 last detector_f1: 0.973535 last picker_P_f1: 0.593209 last picker_S_f1: 0.462458

I would be curious how others have fared because this seems very poor relative to the model described in the paper, which (as far as I can tell) uses the same 1.2M size global STEAD dataset but manages 0.98+ F1 scores for S and P.

I assume the paper model is also different to the provided EqT_model.h5 and EqT_model2.h5 files as well?

smousavi05 commented 3 years ago

Thanks for sharing these. I am always interested to learn about cases where EqT didn't perform well. First of all your data has some out of distribution characteristics. The seismograms you shared looked very deep and relatively far from source. So I recommend retraining the network with a combination of STEAT and your data (maybe bandpass filtering all waveforms to a narrower band like 3 - 10 Hz as well). Most of the waveforms in STEAD are for epicentral distances of < 110 Km and shallower than 20 km. So including some of your data could really help if you have good picks to label them.

On your training part, your loss and f1 scores look fine to me. F1 for P and S never reached 0.98 as far as I remember. The F-score highly depends on the ratios used for augmentation and dropout. You can adjust them based on your data. For instance I don't think you need add_event_r =0.6 as your events are pretty regional and more or less fill the entire 1 min window. Thus no room for the second one. On the other hand your data is pretty noisy, so if you end up using your own data you can reduce the add_noise ratio but if you want to just rely on the STEAT you might want to be even harsher by adding more noise. I think the EqT_model2 is the one used in the paper and I built the EqT_model later. One more thing, I think I changed detection labeling from S arrival + 1.4 × (S - P time) to S arrival + 1.1 × (S - P time) or S arrival + 1.2 × (S - P time) as most of the waveforms had short coda. No matter if your data has longer coda tails or not, reducing the window length always improves the F1 score as learning higher-energy parts is easier.

Hope this helps,

Good Luck, Mostafa

chengxinjiang commented 3 years ago

Thanks for the detailed comments, Mostafa. Will give it another try. I am closing this issue.

smousavi05 / EQTransformer

retrain models #71