uafgeotools / mtuq

moment tensor uncertainty quantification
BSD 2-Clause "Simplified" License
65 stars 22 forks source link

Waveform plot improvements #252

Closed thurinj closed 3 months ago

thurinj commented 4 months ago

I've made a couple of tweaks and additions to the waveform plots. Mainly the two things I wanted to tackle were:

  1. opening up the script a little bit for additional fields in the header. It is not yet a full customizable header, but the added decorator allows to easily add additional fields to the header without messing with all the default parameters and should open up to additions if needed down the line.

  2. tweak the normalization options. I've moved the normalization logic out of the plotting function in order to fix the stream-per-stream normalization, and allow the addition of median based normalization. I haven't changed the default (still based on the maximum value in the dataset). Having the median normalization should help visualize the dataset when big changes in amplitudes are expected (for strong-motion data for instance).

Here is an example of maximum amplitude scaling, with the Surface waves amplitude of the first station artificially increased by a factor 10: 20090407201255351DC_waveforms_max

and here is the corresponding plot using the median amplitude scaling: 20090407201255351DC_waveforms_med

There might be some clipping when using this of normalization, but it could either be tweaked to reduce it. Per-station/stream amplitude is also now working properly.

rmodrak commented 4 months ago

Hi Julien, Thanks for submitting these changes. Some possible discussion questions for the next call below.

    if normalize=='trace_amplitude':
        max_trace = _max(dat, syn)
        ylim = [-1.5*max_trace, +1.5*max_trace]
        axis.set_ylim(*ylim)

    elif normalize=='station_amplitude':
        max_stream = _max(stream_dat, stream_syn)
        ylim = [-1.5*max_stream, +1.5*max_stream]
        axis.set_ylim(*ylim)

    elif normalize=='maximum_amplitude':
        ylim = [-0.75*max_amplitude, +0.75*max_amplitude]
        axis.set_ylim(*ylim)

    elif normalize=='median_amplitude':
        median = _median(stream_dat, stream_syn)
        ylim = [-median, +median]
        axis.set_ylim(*ylim)
thurinj commented 4 months ago

Thanks for the comment Ryan.

I think the main point for the header is that this small change with number of stations is that it isn't distracting, it's compact, and very handy to have, so it would totally make sense to have it as default. Implementing with a decorator without modifying the main Force and MTheader classes was also a way to introduce a way to have easy changes, without having the users to completely re-define a header class.

I think the header could be re-designed in terms of scripting, to allow more dynamic headers and allow more optional fields to be added, but this is beyond this simple addition.

As for the normalization routine, some of the problems were that: stream_dat and stream_syn aren't defined properly, and it is easier to execute the logic before passing the normalization values to the plotting. Otherwise, everything happen within _plot_ZRT or _plot_ZR, where dat and syn are single traces: no way to perform the median scaling nor the stream by stream normalization on the fly if you have only access to a single trace.

That's where pre-computing the scaling feels more manageable and flexible. But let me rewrite the median logic as a function to clean up the code a bit before moving forward.

rmodrak commented 4 months ago

Don't stream_dat and stream_syn provide all available traces from the given station?

UPDATE: Nevermind I see they aren't as you were saying. This might be something I try to correct

carltape commented 4 months ago

Hi Ryan, the header features such as listing the number of stations used are not customizations -- they are features that were required over the course of peer-review on previous publications. The gist is this: if you show a subset of stations in a paper (which is almost always the case), then you need to have a text indicator to distinguish these two cases: 1) 50 stations used and 6 displayed, 2) 6 stations used and 6 displayed. This is identified here: https://github.com/uafgeotools/mtuq/issues/221 We need this information displayed in the text header.

The amplitude scaling issue is a longstanding one (and pre-MTUQ), with the basic issue being that you cannot view waveforms when there is a single high-amplitude outlier. Issue here: https://github.com/uafgeotools/mtuq/issues/249

rmodrak commented 4 months ago

I understand the amplitude normalization issue, addressing this would be a very useful improvement.

For the header, it sounds like two different sets of needs: readability for operational use versus archiving information for later reference.

I'd argue for a readable header as the default, with additional annotations being implemented via a new Header class (either by subclassing, or by writing a new one from scratch). I'd appreciate if we could take this approach as it would reduce the burden on me and Jeremy. (And indeed, you can trade out one header for another in this way.)

PS For reproducibility, there's no substitute for more detailed info archived in the JSON output files?

PSS Note to self, what's up with the high-amplitudes traces getting clipped again in the waveforms plot? I thought this issue was fixed at one point?

thurinj commented 4 months ago

Hi @rmodrak, thanks again for the comments.

As for the stream_dat and stream_syn, these definitely needed a fix, as the normalize='station_amplitude' didn't work at all. Before the update, normalization took either a single value (for 'maximum_amplitude') or the trace by trace amplitude. It can do this because the normalization logic was done when only a single trace is available at a time.

If we want to add other options, such as stats-based (we could do percentile instead of median, for instance), the normalization logic has to happen before the final trace-by-trace plotting function is called, hence the modifications I've made overall, which uses a list of normalization values for each station.

As for the clipping, it wasn't a big issue previously due to the code being limited to maximum amplitude (shouldn't be able to clip) and trace-by-trace normalization (again, shouldn't clip). The median-based normalization's purpose is to tackle and be able to visualize outliers and or account for nearby data/strong motion. Therefore, there might be very different scales. Here is a dramatic example: image

It seems more sensible to turn clipping back on to have something more practical to newer options.

My two cents on the extra field of the header: The number of stations is quite short anyway; it doesn't affect readability in the slightest while also providing some essential info for any use (particularly, as @carltape mentioned, for any kind of communication). As for the implementation: using a decorator also means it doesn't affect the "default" function, and it shows how one might go into further customizing the header: the extra info is easy to add and remove. -- If you prefer, we could modify the header preparation functions and include it as an extra input for the functions -- I'm not too fond of the idea of composing a new header entirely from scratch; it is the least convenient way possible, especially since we have such a nice base to start with.

I haven't modified station annotations, except for fixing the degree's distance option (it wasn't working either).

rmodrak commented 3 months ago

A possible approach based on subclassing:

class NewHeader(MomentTensorHeader):
    def write(self, height, width, margin_left, margin_top):
        super(ExperimentalHeader, self).write(height, width, margin_left, margin_top)

        # write text line #5
        px = 2.*margin_left + 0.75*height
        py = height - margin_top - 1.55
        _write_text("nsta: %s" % len(self.data), px, py, self._axis, fontsize=14)

Then you could just create the header and pass it as a keyword argument to the waveform plotting function.

thurinj commented 3 months ago

Hi @rmodrak ,

Coming back at this topic, I think there is a misunderstanding with this item, as the addition I proposed in the pull request isn't meant to be a custom UAF thing, but a general modification to the default header.

I don't think having the "N-Np-Ns : x-x-x" would be detrimental to any user and it seems like a logical tweak to the default header. Let's think at it the other way, if this was already in the header:

image

I don't think users, at large, would benefit from this field being removed. It's a basic info that is handy to have on the figure, especially with this low of a footprint on the figure.

Would it make more sense to simplify the implementation, remove the decorator and hardcode this field in the header, as with the other parameters? Let me know what you think so I can solve the conflict that popped-in with the last merge, and rework the implementation if needed.

Cheers, Julien

rmodrak commented 3 months ago

There was a suggestion to do it by subclassing but it may not have been very clear-- if you could let me try to illustrate further tomorrow.

On Wed, Apr 3, 2024 at 6:43 PM thurinj @.***> wrote:

Hi @rmodrak https://github.com/rmodrak ,

Coming back at this topic, I think there is a misunderstanding with this item, as the addition I proposed in the pull request isn't meant to be a custom UAF thing, but a general modification to the default header.

I don't think having the "N-Np-Ns : x-x-x" would be detrimental to any user and it seems like a logical tweak to the default header. Let's think at it the other way, if this was already in the header: image.png (view on web) https://github.com/uafgeotools/mtuq/assets/63735192/9ee5192e-11a4-49f0-831c-bc1a3e2bfd19 I don't think users, at large, would benefit from this field being removed. It's a basic info that is handy to have on the figure, especially with this low of a footprint on the figure.

Would it make more sense to simplify the implementation, remove the decorator and hardcode this field in the header, as with the other parameters? Let me know what you think so I can solve the conflict that popped-in with the last merge, and rework the implementation if needed.

Cheers, Julien

— Reply to this email directly, view it on GitHub https://github.com/uafgeotools/mtuq/pull/252#issuecomment-2035872550, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCGSSQAAI535TGLP2MTXB3Y3SV3NAVCNFSM6AAAAABDQHO5ZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVHA3TENJVGA . You are receiving this because you were mentioned.Message ID: @.***>

thurinj commented 3 months ago

Thanks Ryan, but wouldn't that be a "custom" header and not the de-facto one, if we do so by subclassing the main header object?

rmodrak commented 3 months ago

Hi Julien, As far as expanding the default header-- the difficult thing is that currently, the examples do not pass enough information to the plotting functions. I think we would want to pass this new information in a way that's backward compatible (so the old function call continues to work) and not dependent on scope in the way that a decorator might be.

rmodrak commented 3 months ago

As far as the bigger picture-- continuing to support inheritance as a way of customizing figure headers I think would be beneficial. In fact, the MomentTensor and ForceHeader already use inheritance. My impression is that a decorator is a somewhat different, possibly conflicting means to this same end? Whatever the case, there is a specific idea that I would like to run by you via a pull request, if that's alright...

thurinj commented 3 months ago

Hi Ryan,

I went back and check the last updated version of the code, looking at the conflicts. The addition you've made here:

class MomentTensorHeader(SourceHeader):
    """ Writes moment tensor inversion summary to the top of a 
    matplotlib figure
    """
    def __init__(self, process_bw, process_sw, misfit_bw, misfit_sw,
        best_misfit_bw, best_misfit_sw, model, solver, mt, lune_dict, origin,
        data_bw=None, data_sw=None, mt_grid=None, event_name=None):

        if not event_name:
           # YYYY-MM-DDTHH:MM:SS.??????Z
           event_name = '%s' % origin.time

           # trim fraction of second
           event_name = event_name[:-8]

        self.event_name = event_name

        # required arguments
        self.process_bw = process_bw
        self.process_sw = process_sw
        self.misfit_bw = misfit_bw
        self.misfit_sw = misfit_sw
        self.best_misfit_bw = best_misfit_bw
        self.best_misfit_sw = best_misfit_sw
        self.model = model
        self.solver = solver
        self.mt = mt
        self.lune_dict = lune_dict
        self.origin = origin

        # optional arguments, reserved for possible future use
        # (or for use by subclasses)
        self.data_bw = data_bw
        self.data_sw = data_sw
        self.mt_grid = mt_grid

would probably enable to add the extra field without having to actually define a new class, don't you think? The reason for me implementing the decorator in the first place was to avoid modifying the arguments for the MomentTensorHeader and ForceHeader classes, but since they now accept data, and that data are always passed onto the high-level plotting functions, we could go ahead with this and have the header field.

Having the grid, is actually interesting, as it would be a good gateway for having an indicator on the grid type (i.e., having the lune coordinate be displayed as $0^{\star}$ $0^{\star}$ for DC grid, and $n$ $0^{\star}$ for a deviatoric grid to indicate which coordinates are fixed).

Would using these fields work for you?

thurinj commented 3 months ago

I've reverted the changes pertaining to the header. The PR now only contains the normalization features, a typo fix in header.py and a feature that wasn't correctly implemented (degree's distance for the annotations).

The header change will come in another PR with a different feature branch. (Not sure why the tests aren't passing, my branch works properly locally, and all the tests are passing locally). Should be ready to merge.

rmodrak commented 3 months ago

Hi Julien, Using those keyword arguments would work, thanks for being accommodating. I think someone was requesting time discretization in the header as well, probably not urgent. We could revisit maybe in context of a new release (new header, optional misfit normalization, improved colorbar limits, etc.?). May try to take this up in mid May

rmodrak commented 3 months ago

Closing and reopening the pull request should cause the test to rerun. In between, could you please squash the merge commit?

thurinj commented 3 months ago

Sure, will do.