difficulty matching observed and synthetic waveforms

filefolder commented 1 year ago

Hi again-

We have not been able to get nice correlations between synthetic and observed waveforms in Gisola. Using the same data / events in Isola (or other similar codes) generally works very well with high correlation fits so it is not clear what is going wrong.

Looking at the waveforms plotted in the output webpage, the synthetic data appears to be correct / matches the independently filtered observed data, but the observed waveforms in Gisola are significantly longer and look "more stretched out" than they should be. In the below image our filter is [0.01, 0.04, 0.12, 0.14] but it shows a signal 200+ seconds long. In reality it is only 40-50 seconds at most, similar to the synthetic prediction.

gisola_example

The event in this example is a mag 5.6, 19km deep, with stations between 25-83 km away. For simplicity we are only using one frequency filter band (as above) our velocity model is below. It has proven to produce very good fits using other (older) MT codes so I wouldn't suspect that is the problem, unless it is malformed somehow.

Crustal model                       model
depth of layer top(km)   Vp(km/s)    Vs(km/s)    Rho(g/cm**3)
0.0 2.60 1.49 1.08 300 150
2.0 3.54 1.77 1.48 300 150
4.0 4.46 2.19 1.86 300 150
6.0 5.45 2.78 2.27 300 150
8.0 5.79 3.05 2.41 300 150
10.0 6.20 3.38 2.58 300 150
12.0 6.48 3.62 2.70 300 150
14.0 6.81 3.81 2.84 300 150
16.0 7.04 3.96 2.93 300 150
18.0 7.22 4.12 3.01 300 150
20.0 7.30 4.14 3.04 300 150
22.0 7.17 4.13 2.99 300 150
24.0 6.89 4.10 2.87 300 150
26.0 6.66 4.07 2.78 300 150
28.0 6.51 4.04 2.71 300 150
30.0 6.44 4.03 2.68 300 150
32.0 6.59 4.05 2.74 300 150
34.0 6.82 4.08 2.84 300 150
36.0 7.10 4.13 2.96 300 150
38.0 7.37 4.24 3.07 300 150
40.0 7.65 4.39 3.19 300 150
42.0 7.75 4.46 3.23 300 150
44.0 7.85 4.51 3.27 300 1000
56.0 7.92 4.54 3.30 300 1000

To isolate the issue I have allowed only one inversion time window [3.6, 6.9, 245.76] and timeshift [3.6, 6.9, [-67, 10, 167]] option. (I also noticed that despite the event being a 5.6, the window was always 245 seconds when it should have been 409, but I think that may be a separate issue.)

Any ideas what is happening, or have you experienced anything similar? Any help appreciated, thanks!

esokos commented 1 year ago

It is hard to say what it going on here !

Since i don't have direct access to the whole inversion folder i will try to suggest a few things to be checked a) is the Origin Time given to gisola correct ? to eliminate the possibility that the code selected a part of noise instead of data. b) at least for MRO10,12 it looks as if an instrumental disturbance exists .. is the "mouse" detection enabled.? c) crustal model looks very detailed (i am not sure of gisola can handle so many layers!) but it seems it is ok since synthetic data look reasonable d) it would help to see/have the raw data from these stations

sorry i cannot think of something else now! t.

nikosT commented 1 year ago

@filefolder, does the streams_corrected.mseed file contain the observed data as expected? However, as @esokos suggested, could you please share the entire results folder for the specific event? My intuition says unless there is a configuration issue, the metadata of these stations might have wrong periods. Do you experience it with other events (that have other stations associated)? About, the crustal model (it supports up to 200 layers -theoretically-), Unless you modified the file: https://github.com/nikosT/Gisola/blob/main/src/core/inversion/parameters.f90. If you use for instance Novotny crustal layer do you see the same observed period? About the last issue, my intuition also says here that for some reason it takes a different magnitude value, but I can not say unless we have access to the working directory.

filefolder commented 1 year ago

Thanks both for the replies; I will host some example data as well as the results from an alternate code tomorrow.

A colleague is actually running that code but the source data is entirely from the Gisola output, so I suspect my settings are wrong somehow rather than the data being off.

filefolder commented 1 year ago

So I am looking at the output webpage with fresh eyes and I think the issue is in the pre-processing of the waveforms. In particular there seems to be too much low frequency content being carried over...

Here is the full picture of the initial problem solution in which it is clear there is too much long period signal present gisola_bad

In stream.py, this seems sensible for the most part

        st.remove_response(inventory=inv, 
                       output='VEL', # output units in Velocity (m/s)
                       pre_filt=(0.001, 0.002, 8, 9), # bandpass frqs (Hz)
                       zero_mean=True, # detrend(demean)
                       taper=True, # cos taper from f1 to f2 and from f3 to f4
                       taper_fraction=0.05 # percentage of tapering
                       )

But my actual requested filter doesn't seem to be applied anywhere following this (that I can see, at least.. haven't looked very hard). If I manually change the pre_filt values to my given filter (e.g. 0.03 to 0.04 tapered min freq) the resulting waveforms (and fits!) are much MUCH nicer. I am not sure if that is meant to be done here or elsewhere.

Second, and in this case probably very closely related to the first issue, I find that when my sensors are offset / out of level, then a higher taper_fraction is necessary to reduce spurious artifacts when filtering as their values will be non-zero at the beginning of the trace. In my original copy of your code this was set to 0.02, I see now it is 0.05, I might even suggest raising this to as high an amount as possible given the windowing. 0.10 still gives me enough space for in most instances.

Applying my minimum filter and increasing the taper to 0.1 produces the below, which still needs a lot of work, but at least is starting to look sensible. I think there still may be some discrepancy between the frequency content in the synthetic and observed waveforms but I haven't found anything definitive yet.

gisola_good

Tangentially, there are clear amplitude problems relating to these being OBS with some on sediment, some on rock, strange attenuation environments, and possibly even response gain inconsistencies. I don't know if this is a Very Bad Idea or not but I wonder if an option to "normalize" the amplitudes of both synthetic and observed waveforms to 1 may help improve solutions, or at least help users better diagnose and configure settings to for better fits. As it stands I can't see what's going on with many of these stations.

Attaching also the streams, streams_corrected, and inventory in case it helps diagnose anything, as well as the output log.txt. If you need anything more let me know. info.zip log.txt

esokos commented 1 year ago

Hi @filefolder ,

the problem is, as you said, in the waveforms. i had a look at info.zip and i saw lots of noisy and clipped traces. So first of all the clipped traces should be removed, i am not sure if Gisola manages this 100% correctly, it needs checking, @nikosT could comment on the algorithm that detects the clipping and of course a manual check is needed.

Besides clipping i saw "disturbances" in the data (see here Zahradník, J., Plešinger, A., (2005). Long-period pulses in broadband records of near earthquakes Bul. Seism. Soc. Am., Vol. 95, No. 5, 1928-1939), these are also a big problem. Gisola has a way to detect this also, but again i am not so sure if it is 100% correct.

Then you should find a freq band that signal is clearly seen and apply this in inversion. By the way the filtering for the inversion is not done during the instrument correction stage, it is done latter in the fortran code before the inversion, this is why you don't see it in st.remove_response.

You are right about the tapering, it is needed in such cases but i think most of help would come from using a different pre filtering (i.e., pre_filt=(0.001, 0.002, 8, 9), # bandpass frqs (Hz)) make it more strict in lower end to remove the long period waves..

By the way, this event looks very noisy for 5.6 at such distances !!(at least for land !!) and for such a "detailed" work i would recommend using the manual ISOLA (at least for a 1-2 events) to understand better problems with data which station to trust etc.

t.

filefolder commented 1 year ago

Thanks, and yes this is a terrible example, just the one I was working on when I tried to give Gisola another look. I doubt I will get a solution here. And yes there are indeed clip detection routine but I have disabled it and the others for debugging.

The main issue I think is that the code does not apply the requested filter band in the config.yaml file to the observed data, which seems to be a bug somewhere? Only when I manually replace the pre_filt variable with my filter in the remove_response call in response.py do the results begin to look close. Probably the pre_filt variable should remain as it is is, and the target band filter should be applied later, but I am not sure where would be best.

esokos commented 1 year ago

i don't think i have your configuration file, anyway the place is here in config.yaml

Frequency:

[4.0, 5.5, [0.04, 0.05, 0.08, 0.09]]
[5.6, 5.9, [0.02, 0.03, 0.06, 0.07]]
[6.0, 7.0, [0.01, 0.02, 0.05, 0.06]]
[6.0, 7.0, [0.007, 0.008, 0.02, 0.03]]
[7.1, 9.0, [0.007, 0.008, 0.02, 0.03]]

so i guess you should have something like this

[3.6, 6.6, [0.01, 0.04, 0.12, 0.14]]..?

maybe change to

[3.6, 6.6, [0.12, 0.14, 0.18, 0.19]]..?

nikosT commented 1 year ago

Hi @filefolder, the modules for data filtering (and the algorithms that each one apply) are referred here: https://github.com/nikosT/Gisola/wiki/Modules. Extensive description can be found in the paper: https://doi.org/10.1785/0220210153

filefolder commented 1 year ago

Thanks again for the help.

I can confirm that the code is reading my filters correctly from that config.yaml, but sometimes it does seem the filter isn't applied and I can't figure out why. I'll keep at it.

Possibly related (and final!) question regarding the magnitude value that determines the frequency (and timeshift, and window, etc) values, is this the magnitude given in the supplied QuakeML file or the MLv magnitude calculated independently in your code? This can be an issue for some of our solutions as we are not using MLv, but rather MLa or MLr which can be quite a bit smaller. Additionally, if there is no given magnitude in the source QuakeML, is this then calculated independently or is some default value assumed?

esokos commented 1 year ago

this could be an explanation but it is a question for @nikosT !! How the code derives the magnitude from the QuakeML file (there is no magnitude calculation in the gisola code, besides Mw of course)

nikosT commented 1 year ago

Hi @filefolder, for real-time use, you can set the Magnitudetype at the Watcher block of the configuration file. For instance, Mlh. This depends on what types the event's source (e.g. FDSNWS-event) supports. In offline mode, it will get the associated magnitude (value, type, uncertainty, etc.) with the preferred_origin as it is set in the QuakeML file. If not set, it will get the most recent committed to the file. If the historical flag is set in the offline mode, it will retrieve the first chronologically committed. This is only for catching the performance of an operational procedure. Usually, the more time passes the more accurately the origin will be found. Thus, you should always use the most recent (usually).

In any case, all the rules described in the configuration file are associated with the type of magnitude that is associated to the origin (hypocenter), not focal mecahism's origin.

filefolder commented 1 year ago

Thanks all. I think that answers my questions, so closing this issue.

nikosT / Gisola

difficulty matching observed and synthetic waveforms #39