Closed liamtoney closed 4 years ago
I like this fix for practical reasons. It may be nice to remind people that the data is artificially set to 0 in the processing though. Perhaps dumping something like "Filling potential data gaps with 0." to the screen.
I have thought about this problem before and it's a bit tricky. It could lead to problems when detrending the signal etc. For very small gaps I usually used fill_value='interpolate'
but I see how this is not a good option for larger gaps. It would be great if one could fill it with nan values but that might lead to other problems, right?
It would be great if one could fill it with nan values but that might lead to other problems, right?
Yeah, that's my concern. I think it might give similar issues to the "masked array" default.
Okay, I've added some warnings when we use fill_value=0
. These hopefully can serve as reminders to folks if they see weird or unexpected results. Unfortunately there is no great option for the fill value.
Another problem with setting it to 0 is that you wouldn't be able to cut out these sections later (If you want to convert it into a numpy array and set the missing values to nan there to not let it interfere with the processing / statistics). You would have to do that manually then. But maybe this is just not a solvable problem. Just thought I'd share this concern.
Hmmm... that's a good point, it could be nice to retain masked. I wonder if we could have an optional kwarg called e.g. masked
that is by default False
. If True
, we don't use fill_value=0
in either the merging or trimming steps. This results in a masked output.
(If we do the trim step with fill_value=0
on masked arrays, things go haywire as seen in the top comment.)
Thoughts on this option? @jwbishop @davidfee5 @jegestrich
I like the option kwarg idea to provide additional flexibility. Would we want to include other fill options too?
How about making the kwarg equal to the kwarg in the merge command and defaults to fill_value=None
I think I might prefer fill_value=0 as default
Ah sorry that's what I meant!
How about making the kwarg equal to the kwarg in the merge command
Good idea! I think we'd want to make it the same for the trim command too though. Both methods allow for a scalar value or None
. Merge additionally allows for 'interpolate'
or 'latest'
. How to handle that? We could either not allow the string inputs, or we could tell the trim command to use fill_value=0 for these cases of inputs to merge.
Alright all, I implemented a more flexible method where the user can specify the fill value for both merge and trim. Both default to 0
, which performs the merge and trim with a fill value of 0. But if either is False
, we don't perform that operation. Take a look at the "Files changed" tab and let me know if you're satisfied!
🚨 Note that his will break any code that uses gather_waveforms()
or gather_waveforms_bulk()
with the merge
kwarg since that is no longer present! 🚨
———
Pinging @davidfee5, @jegestrich, and @jwbishop.
Consider the following command:
Without merge, two traces are returned here. So we want to merge. With the current code, the output is: With this PR, we use
fill_value=0
in the call toStream.merge()
, and the result is: This is what we'd expect.The reason why the first plot looks weird is that
Stream.merge()
creates masked arrays by default. Later on in the code, when we trim and pad with zeros, things go haywire. This modification assures that traces are merged and filled with zeros.