uafgeotools / waveform_collection

Collect seismic/infrasound waveforms and metadata from IRIS/WATC/AVO servers, local miniSEED files, etc.
https://uaf-waveform-collection.readthedocs.io/
MIT License
11 stars 6 forks source link

gather_waveforms fails because of a merging error when sample rates are erroneous #34

Closed darren-tpk closed 5 months ago

darren-tpk commented 11 months ago

gather_waveforms throws an error Exception: Can't merge traces with same ids but differing sampling rates! when sampling rates are inconsistent across traces with the same id. This is most commonly the case where sampling rate is an incorrect runaway float, for example, 100.00000XX or 99.999999XX instead of 100.

To reproduce:

from waveform_collection import gather_waveforms
from obspy import UTCDateTime
stream = gather_waveforms(source='IRIS', network='AV', station='VNHG',
                          location='', channel='*HZ', starttime=UTCDateTime(2003,1,1), 
                          endtime=UTCDateTime(2003,1,1,1), verbose=False)

One potential solution would be to edit _safe_merge in server.py with the added failsafe of rounding off sampling rates, on top of the existing failsafe that checks for data types.

    try:
        st.merge(fill_value=fill_value)
    except Exception:  # ObsPy raises an Exception if data types are not all identical
        for tr in st:
            if tr.data.dtype != np.dtype(np.int32):
                tr.data = tr.data.astype(np.int32, copy=False)
    try:
        st.merge(fill_value=fill_value)
    except Exception:  # ObsPy also raises an Exception if traces with the same ids have different sampling rates
        for tr in st:
            if tr.stats.sampling_rate != np.round(tr.stats.sampling_rate):
                warnings.warn('Rounding off %s sample rate to the nearest integer for merge compatibility' % tr.id,
                              CollectionWarning)
                tr.stats.sampling_rate = np.round(tr.stats.sampling_rate)
        st.merge(fill_value=fill_value)  # Try merging with rounded sampling rates

Though I wonder if there are instances where this rounding of sample rate might not be desired? Happy to submit a PR with the above fix if it's a agreeable solution

liamtoney commented 11 months ago

Thanks for the detailed issue, Darren. I recall running into this issue before. It's frustrating! (I wonder how this happens and if it's an EarthScope-side issue or something w/ ObsPy...)

I assume you're aware of the merge_fill_value=False option so that you can at least avoid the error in the short term?

Regarding your proposed fix, I think this makes sense. I don't love the idea of rounding metadata like that, but clearly that didn't stop us from implementing the data type conversion work-around 🤒 and I don't really see another way.

I think given that the user can bypass this behavior with merge_fill_value=False, it's a good solution. Placing it in _safe_merge() also is logical. Rounding to the nearest integer is a reasonable assumption, too. I think a PR would be welcome — @davidfee5 what do you think?

The two things I'd change from what you've proposed above are:

  1. If verbose=True, print the original and new sampling rates for the user's info (in a way that shows the non-integer value for the original sampling rate) to give the user confidence
  2. In the control flow you propose above, if neither statement raises an error the st gets merged twice. Is that fine? Maybe? The other way to do it would be to catch (except) the Exception once and then check the message of the exception: https://stackoverflow.com/questions/13531247/python-catching-specific-exception — seems more fragile though since we then rely on the specific error text, e.g. the string literal 'Can't merge traces with same ids but differing sampling rates!'
davidfee5 commented 10 months ago

Sorry for the delay here. I think @darren-tpk's proposed solution and @liamtoney's consideration #1 sound good, although I agree #2 might be too specific?

The only rare case I can see this being a real issue is that if the station ACTUALLY changes sample rate, say like a BDF channel goes from 20 to 40 Hz. That isn't related to the rounding and would not be caught by this addition, but it does occasionally (again rarely) happen.

I suggest @darren-tpk make a pull request here? thanks

liamtoney commented 5 months ago

Closed in #35