uafgeotools / mtuq

moment tensor uncertainty quantification
BSD 2-Clause "Simplified" License
67 stars 22 forks source link

Using custom windows (FLEXWIN/PyFlex) #264

Open lsawade opened 5 months ago

lsawade commented 5 months ago

Hi Devs,

I want to quantify the uncertainty of my global CMTs using MTUQ, but to do that, I somehow need to work with the custom windows that I selected in PyFlex for the global long-period case.

Is it possible to implement it?


TL;DR

My global long-period use case is somewhat niche compared to local/regional cases. Windows are fairly long at times but irregular, I use 3 different wave categories each with different windows. I chatted with @thurinj at SSA, and he mentioned that it would be best to just open a ticket to start discussing this.

So, what have I done so far?

I already have implemented a rough Green function client for MTUQ that takes in local subsets of the Green functions computed with my non-merged version of specfem3d_globe.

What do I need?

To truly test MTUQ with my inverted moment tensors, I need to make approximately the same measurements as I have done in my inversion hence the requirement of custom windows.

What am I willing to do?

If this is at all possible within the MTUQ framework, I'm more than happy to implement it, as this would be a great application for the moment tensors. So, if possible, I'm willing to do the leg work -- under guidance of course!

thurinj commented 5 months ago

Hi @lsawade, thanks for following up and opening an issue!

It would make total sense to have an additional window category (`custom`?) with completely user-defined time windows (or populated by another software in the processing chain in your case).

I don't foresee specific challenges in implementing this; pretty much everything happens in mtuq/process_data.py, and you can probably start by looking up what's happening for each of the exiting window_type parameters. I don't think it would clash with anything currently implemented.

I also want to stress that it would be a cool opportunity for the code at large and not just for this specific application: with a custom window type profile we can use multiple body-waves windows (which might be needed in your case), but we won't have any plug-and-play waveform visualization. There's already a cool example of application on the S1222a Marsquake using a modified version of MTUQ to handle P and S body waves here https://doi.org/10.1029/2023JE007793, but is unfortunately not currently supported.

We have weight files that are used as input for a lot of tuning parameters (origin time, static time shift, data weight...), but because they are supposed to work with the legacy code CAPUAF, I recommend not using them. How do you store your time-windows information? (waveform header, external file?) -- this might guide the way you approach your implementation, but it's better if you can make it general/flexible.

rmodrak commented 5 months ago

If you had, say, a dictionary of window start and end times, it should be fairly quick to get something working.

To add to Julien's reply, I would start by glancing at the following:

code: https://github.com/uafgeotools/mtuq/blob/109b4e57a9a1300d14bb1d74f8619d0057bb8bd5/mtuq/process_data.py#L607 docs: https://uafgeotools.github.io/mtuq/user_guide/04.html

One potential wrinkle is that some code optimization depends on having window lengths that are the same from station to station. If your window lengths vary from station to station, then the C-accelerated misfit function may fail, but there is a pure Python version you could fall back to (more details: https://github.com/uafgeotools/mtuq/blob/master/tests/test_misfit.py).

Let me keep thinking about it and respond further after our program review this week!

On Tue, May 7, 2024 at 4:27 PM thurinj @.***> wrote:

Hi @lsawade https://github.com/lsawade, thanks for following up and opening an issue!

It would make total sense to have an additional window category (custom?) with completely user-defined time windows (or populated by another software in the processing chain in your case).

I don't foresee specific challenges in implementing this; pretty much everything happens in mtuq/process_data.py, and you can probably start by looking up what's happening for each of the exiting window_type parameters. I don't think it would clash with anything currently implemented.

I also want to stress that it would be a cool opportunity for the code at large and not just for this specific application: with a custom window type profile we can use multiple body-waves windows (which might be needed in your case), but we won't have any plug-and-play waveform visualization. There's already a cool example of application on the S1222a Marsquake using a modified version of MTUQ to handle P and S body waves here https://doi.org/10.1029/2023JE007793, but is unfortunately not currently supported.

We have weight files that are used as input for a lot of tuning parameters (origin time, static time shift, data weight...), but because they are supposed to work with the legacy code CAPUAF, I recommend not using them. How do you store your time-windows information? (waveform header, external file?) -- this might guide the way you approach your implementation, but it's better if you can make it general/flexible.

— Reply to this email directly, view it on GitHub https://github.com/uafgeotools/mtuq/issues/264#issuecomment-2099469678, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCGSSRXDPI6ZF77RLQWRULZBFPONAVCNFSM6AAAAABHLZWRC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJZGQ3DSNRXHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

lsawade commented 4 months ago

Hi @thurinj, hi @rmodrak

thank of the swift replies! It does indeed look simpler than expected. And I'd be happy to implement a custom window and tests for it.

@thurinj to answer your question on my setup: my current workflow is pretty ugly. I process observed and synthetics in up to three different categories (body, surface, mantle), and then I run PyFlex for each category and simply attach a list of PyFlex window objects to each of the corresponding observed Trace.stats. Finally, I just store the entire stream as a pickle object. It is not ideal in terms of portability, but reduces the number of files that have to be read and matched.

That being said, I think a workflow where it's universal to MTUQ is probably the best way to move forward, I'm happy to preprocess things.

I guess really what it boils down to is, how do I enable selecting multiple traces. The SAC reader hints at a station_id being able to have the same trace twice, but I don't know whether this would make things weird when comparing observed and synthetics; I have to assume it does. Otherwise, I could just add multiple traces to the same station but with separate custom windows (picks of start and endtime).

The ugly way, that immediately jumped to mind was to check all traces for each category (body, surface, mantle) for the max number of windows (N) and have one dataset for window 1, to window N, for each corresponding category. Where DS_1 would have almost all stations, and DS_N would have barely any stations. But it does not really make sense.

Another way that I thought would be possible is to take the entire trace, but taper the traces to zero outside the windows. The downside is the potential weirdness of the cross_correlation (if the waveforms somewhat match that should be fine). This approach would also keep the waveforms in each category at the same length but lengthy of course...

Either way, I'll wait for your thoughts, @rmodrak!

thurinj commented 4 months ago

Hi @lsawade,

Coming back on the discussion, the way we process data in MTUQ is to load all three component data as a Dataset object, which is sort of a fancy list of obspy stream with a few utility methods. Then, we map the same processing object to the whole dataset and each trace is treated the same way, that is where you would want to have something different.

I think the main item as @rmodrak mentioned, is to get a input of all your start-end times for each of the traces in your dataset, and iterate through it with a different windowing setting for each trace. It could be a json/yaml as a dual input for window_type and an input parameter file, something like:

ProcessData(
filter_type='Bandpass',
freq_min= fmin,
freq_max= fmax,
pick_type='taup',
taup_model=model,
window_type='custom',
window_file='windows_param.json',
capuaf_file=path_weights,
)

with a check for window_file if window_type == custom (we have a couple of these type of conditional checks in the process_data.py script anyway).

I imagine you could either have separate files for each waveform group, or a structured file with different sections for each group. The first option would reduce the implementation complexity since you wouldn't have any check in-code for the waveform type. The input file approach could be more robust and straightforward than having to deal with the sac header?

And as you loop through the whole dataset during the processing, you can directly get the starttime and endtime from the associated input file.

Let me know if I've missed something!