uafgeotools / waveform_collection

Collect seismic/infrasound waveforms and metadata from IRIS/WATC/AVO servers, local miniSEED files, etc.
https://uaf-waveform-collection.readthedocs.io/
MIT License
13 stars 7 forks source link

Use fill_value=0 for optional merge #21

Closed liamtoney closed 4 years ago

liamtoney commented 4 years ago

Consider the following command:

from obspy import UTCDateTime
from waveform_collection import gather_waveforms

st = gather_waveforms(
    'IRIS',
    'AT',
    'SIT',
    '',
    'BHZ',
    UTCDateTime('2014-02-16T14:22:14'),
    UTCDateTime('2014-02-16T14:31:40'),
    merge=True,
)

st.plot()

Without merge, two traces are returned here. So we want to merge. With the current code, the output is: old With this PR, we use fill_value=0 in the call to Stream.merge(), and the result is: new This is what we'd expect.

The reason why the first plot looks weird is that Stream.merge() creates masked arrays by default. Later on in the code, when we trim and pad with zeros, things go haywire. This modification assures that traces are merged and filled with zeros.

jwbishop commented 4 years ago

I like this fix for practical reasons. It may be nice to remind people that the data is artificially set to 0 in the processing though. Perhaps dumping something like "Filling potential data gaps with 0." to the screen.

jegestrich commented 4 years ago

I have thought about this problem before and it's a bit tricky. It could lead to problems when detrending the signal etc. For very small gaps I usually used fill_value='interpolate' but I see how this is not a good option for larger gaps. It would be great if one could fill it with nan values but that might lead to other problems, right?

liamtoney commented 4 years ago

It would be great if one could fill it with nan values but that might lead to other problems, right?

Yeah, that's my concern. I think it might give similar issues to the "masked array" default.

liamtoney commented 4 years ago

Okay, I've added some warnings when we use fill_value=0. These hopefully can serve as reminders to folks if they see weird or unexpected results. Unfortunately there is no great option for the fill value.

jegestrich commented 4 years ago

Another problem with setting it to 0 is that you wouldn't be able to cut out these sections later (If you want to convert it into a numpy array and set the missing values to nan there to not let it interfere with the processing / statistics). You would have to do that manually then. But maybe this is just not a solvable problem. Just thought I'd share this concern.

liamtoney commented 4 years ago

Hmmm... that's a good point, it could be nice to retain masked. I wonder if we could have an optional kwarg called e.g. masked that is by default False. If True, we don't use fill_value=0 in either the merging or trimming steps. This results in a masked output.

(If we do the trim step with fill_value=0 on masked arrays, things go haywire as seen in the top comment.)

Thoughts on this option? @jwbishop @davidfee5 @jegestrich

davidfee5 commented 4 years ago

I like the option kwarg idea to provide additional flexibility. Would we want to include other fill options too?

jegestrich commented 4 years ago

How about making the kwarg equal to the kwarg in the merge command and defaults to fill_value=None

davidfee5 commented 4 years ago

I think I might prefer fill_value=0 as default

jegestrich commented 4 years ago

Ah sorry that's what I meant!

liamtoney commented 4 years ago

How about making the kwarg equal to the kwarg in the merge command

Good idea! I think we'd want to make it the same for the trim command too though. Both methods allow for a scalar value or None. Merge additionally allows for 'interpolate' or 'latest'. How to handle that? We could either not allow the string inputs, or we could tell the trim command to use fill_value=0 for these cases of inputs to merge.

liamtoney commented 4 years ago

Alright all, I implemented a more flexible method where the user can specify the fill value for both merge and trim. Both default to 0, which performs the merge and trim with a fill value of 0. But if either is False, we don't perform that operation. Take a look at the "Files changed" tab and let me know if you're satisfied!

🚨 Note that his will break any code that uses gather_waveforms() or gather_waveforms_bulk() with the merge kwarg since that is no longer present! 🚨

———

Pinging @davidfee5, @jegestrich, and @jwbishop.