smousavi05 / EQTransformer

EQTransformer, a python package for earthquake signal detection and phase picking using AI.
https://rebrand.ly/EQT-documentations
MIT License
301 stars 148 forks source link

Getting many false positive and false negative #95

Closed saeedsltm closed 2 years ago

saeedsltm commented 2 years ago

I've prepared everything according to documentation and tutorial, to my local data consists of 34 short period stations with 100Hz sampling rate, all located within 50km^2. As described in tutorial i ran EQT just to one day of my whole data set. The final output (Y2000.pha) includes more than 200 (so-called) event but most of them are fakes, and also it could not detect some true events (magnitude range of 0-2.0) that we have detect them manually using routine STA/LTA procedure. should i change any parameters like detection, P or S threshold? Should be noted that despite the region is noisy, but applying appropriate filter (5-15 Hz) make it possible to detect events and pick P and S phases manually.

smousavi05 commented 2 years ago

@saeedsltm could you share the output plots (especially those with spectroms) of those false positives (fake events)? Please also share the parameters you used? if the noise is the main issue you can apply a narrower band-pass filter (e.g. 5-13 Hz) first and then run the EqT but before then we should make sure that issue is not related to the instrument types.

saeedsltm commented 2 years ago

@smousavi05 , well i used the following which is what have been mentioned in tutorial:

detection_threshold=0.3,
P_threshold=0.1,
S_threshold=0.1, 

using above parameters + "EqT_model2.h5" as the input model, i got so many events which are likely non earthquake tremors, and for those which are certainly earthquakes, i missed P picks a lot and no P nor S for some stations. Then i decided to increase thresholds in this case:

detection_threshold=0.6,
P_threshold=0.7,
S_threshold=0.7, 

but still are missing P phases for some stations. I changed also the input model to "EqT_model.h5" and in this time i got meaningfully less detected events, but it seems i lost many small events. Regarding your suggestion about applying filer before running EQT, do you mean apply filter to the whole continues data set? or should i do something inside the EQT code? And what do you mean of "make sure that issue is not related to the instrument types" ? In my case all station are short period and velocity meter (BHZ,N,E).

smousavi05 commented 2 years ago

@saeedsltm are you using high threshold values (i.e. detection_threshold=0.6, P_threshold=0.7, S_threshold=0.7) for "EqT_model.h5" as well? if so you need to reduce them you can go as low as detection_threshold=0.2, P_threshold=0.05, S_threshold=0.05,

The "EqT_model.h5" is a more conservative model and should not give you too many false positives.

Although you can change the filtering inside the EqT as well but I think it would be easier to just apply the bandpass filtering to your continuous data up front. EqT has worked well in many cases but sometimes the results are not that good mainly because the instrument distributions and their responses are very different than for those instrument in the training set. Although you can consider retraining and transfer learning of our model but it would be easier to use the detected event an run a secondary postprocessing like this: "Siamese Earthquake Transformer: A pair‐input deep‐learning model for earthquake detection and phase picking on a seismic array" or just simply do a secondary template matching using packages like EQCorescan. The false positive ones are not the main concern as they will get removed automatically during the association and location. I think you should concentrate on reducing the false negatives.

saeedsltm commented 2 years ago

Dear @smousavi05

are you using high threshold values (i.e. detection_threshold=0.6, P_threshold=0.7, S_threshold=0.7) for "EqT_model.h5" as well? if so you need to reduce them you can go as low as detection_threshold=0.2, P_threshold=0.05, S_threshold=0.05,

I've used low threshold like detection_threshold=0.1, P_threshold=0.05, S_threshold=0.05 when using "EqT_model.h5". Some of the output picks of EQT is here (displayed with SeisComP) which for some stations both P and S are really good (ABTL station), for some stations we have P or S biased picks (NOKA station), and for some other there is no P pick despite there is a clear P onset (TOLS, KHIA, AHRM stations). These station are using same equipment and are places not very far from each other, of-course some are more nosier than others and may requires different filtering range.

ABTL station: https://drive.google.com/file/d/1-b6vJXnzvtfQ_9soMvR_XaCAKlFN41qO/view?usp=sharing NOKA station: https://drive.google.com/file/d/1m5ebd1Zr8nCvWOYQOPyxiKNN2Wchc3QE/view?usp=sharing TOLS station: https://drive.google.com/file/d/1k1_2afjOLD491tHJ2Q4gDczV9F3kquJ9/view?usp=sharing KHIA station: https://drive.google.com/file/d/1acohVzrlr0St2iccrVf0Oe1Xvj37hFvI/view?usp=sharing AHRM station: https://drive.google.com/file/d/1rYqDMUsqZg4-c4uoPLMCBSlQV2qoDabm/view?usp=sharing

Although you can change the filtering inside the EqT as well but I think it would be easier to just apply the bandpass filtering to your continuous data up front.

for filtering i made some changes in preprocessor module for handling some filter option. I could reach better results compared to manual picks by using frequency range between 1-10 Hz. But as i've mentioned we have still some false negatives. Using "EqT_model2.h5" it gives me lots of false positive event when i'm using very large thresholds (>0.8). The only problem is we have missed some picks on some stations.