Disscussion Request About PhaseNet for Microseismic

iqram1337 commented 3 months ago

Hello, Mr. Zhong.

I apologize in advance for sending this message through the issues section of your repository. I am Iqram, an undergraduate student of Geophysical Engineering, Indonesia.

I am currently doing my bachelor thesis on applying PhaseNet to geothermal microseismic data and traning using seisbench. However, I encountered a problem because I found that the P and S labels were too close together. I would like to discuss with you about this if you allow me.

I appreciate your time in responding to my request and hope to get your contact email, thank you very much.

Regards, Iqram.

zhong-yy commented 3 months ago

Hi @iqram1337 , Do you mean in your training data, the manual picked P and S arrivals are very close? How close are them? Can you give an example of the labels?

Is your problem related to this issue in SensBench? https://github.com/seisbench/seisbench/issues/221

iqram1337 commented 3 months ago

This is the example:

or is it normal to get this N, P, and S distribution like this one? (sigma:10 picture)

btw, I have read your submitted paper about volpick and it is very cool and impressive. I honestly thought that volpick would be my solution for geothermal microseismic data because of its similar characteristics with volcanic eq, but it turns out that the performance is still not suitable. Maybe I need to do transfer learning on the volpick model?

if you were me, having about 9000 P and S data, what would you do?:

transfer learning using the original PhaseNet weight,
transfer learning using volpick weight,
training from scratch.

Thank you

zhong-yy commented 3 months ago

Hi @iqram1337 ,

The P and S labelling curves are a bit biased when P and S are too close. This is because PhaseNet uses sofmax as output which requires the lables to sum to 1.

You can try EQTransformer which uses sigmoid as output so that P and S do not affect each other. In this case, you need to disable the noise label in the ProbabilisticLabeller, for example

 sbg.ProbabilisticLabeller(
      shape="gaussian",
      label_columns=phase_dict,
      noise_column=False,
      sigma=10,
      dim=0,
  ),

Alternatively, maybe you can add an option to change sigma according to S-P difference (I haven't tried it before)? For example, change the following code https://github.com/seisbench/seisbench/blob/266ef8d77cd2687e9395000fb85d96dff18ea25b/seisbench/generate/labeling.py#L281-L283 to

for label_column, label in self.label_columns.items():
    i = self.label_ids[label]
    p_list=[]
    s_list=[]
    if label=="P":
        p_list.append(lmetadata[label_column])
    if label=="S":
        s_list.append(lmetadata[label_column])
    min_s_p=... # calculate the minimum difference between P and S in p_list and s_list
...
    if min_s_p<4*self.sigma:
        sigma1=0.25*min_s_p
    else:
        sigma1=self.sigma
...
...
                    label_val = self._labelshape_fn_mapper[self.shape](
                        onset=onset, length=X.shape[width_dim], sigma=sigma1
                    )
...

For transfer learning, I would try all the three options you mentioned, then compare the performances of the three models on a validation set to see which one is better.

iqram1337 commented 3 months ago

Thank you very much Mr. Zhong, I will try out all your suggestions.

zhong-yy / volpick

Disscussion Request About PhaseNet for Microseismic #1