jpjones76 / SeisIO.jl

Julia language support for geophysical time series data
http://seisio.readthedocs.org
Other
47 stars 20 forks source link

ungap! throws bounds error on mseed file with many negative time gaps #29

Closed jpjones76 closed 4 years ago

jpjones76 commented 4 years ago

Originally posted by @tclements in https://github.com/jpjones76/SeisIO.jl/issues/28#issuecomment-554531721

One more particularly devious example. Here's an mseed file that has > 10 of negative time gaps. After merging, which works, ungap throws a bounds error.

S = read_data("mseed","CIRIO__BHE___2017101.mseed")
SeisData with 1 channels (1 shown)
    ID: CI.RIO..BHE                        
  NAME: CI.RIO..BHE                        
   LOC: 0.0 N, 0.0 E, 0.0 m                
    FS: 40.0                               
  GAIN: 1.0                                
  RESP: a0 1.0, f0 1.0, 0z, 0p             
 UNITS:                                    
   SRC:                                    
  MISC: 0 entries                          
 NOTES: 0 entries                          
     T: 2017-04-11T00:00:00.020 (14 gaps)  # note 14 gaps 
     X: -5.600e+01                         
        -6.530e+02                         
            ...                            
        -5.850e+02                         
        (nx = 3457836)                     
     C: 0 open, 0 total

S has 14 gaps. Then merge the seisdata

merge!(S)
S
SeisData with 1 channels (1 shown)
    ID: CI.RIO..BHE                        
  NAME: CI.RIO..BHE                        
   LOC: 0.0 N, 0.0 E, 0.0 m                
    FS: 40.0                               
  GAIN: 1.0                                
  RESP: a0 1.0, f0 1.0, 0z, 0p             
 UNITS:                                    
   SRC:                                    
  MISC: 0 entries                          
 NOTES: 1 entries                          
     T: 2017-04-11T00:00:00.020 (12 gaps)  # note 12 gaps 
     X: -5.600e+01                         
        -6.530e+02                         
            ...                            
        -5.850e+02                         
        (nx = 3456006)                     
     C: 0 open, 0 total

S now has 12 gaps. Now try to ungap

ungap!(S1)
ERROR: BoundsError

which throws a bounds error. For reference, obspy reads 15 overlapping traces but successfully merges and ungaps.

In [1]: import obspy                                                                                 

In [2]: st = obspy.read("CIRIO__BHE___2017101.ms")                                                   

In [3]: st                                                                                           
Out[3]: 
15 Trace(s) in Stream:
CI.RIO..BHE | 2017-04-11T00:00:00.019500Z - 2017-04-11T18:38:23.219500Z | 40.0 Hz, 2684129 samples
CI.RIO..BHE | 2017-04-11T18:38:18.095038Z - 2017-04-11T18:38:28.395038Z | 40.0 Hz, 413 samples
CI.RIO..BHE | 2017-04-11T18:38:23.245038Z - 2017-04-11T18:38:33.520038Z | 40.0 Hz, 412 samples
CI.RIO..BHE | 2017-04-11T18:38:28.420038Z - 2017-04-11T18:38:33.545038Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:33.544538Z - 2017-04-11T18:38:38.669538Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:33.570038Z - 2017-04-11T18:38:38.695038Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:38.694538Z - 2017-04-11T18:38:43.819538Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:38.720038Z - 2017-04-11T18:38:43.845038Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:43.844538Z - 2017-04-11T18:38:48.969538Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:43.870038Z - 2017-04-11T18:38:48.995038Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:48.994538Z - 2017-04-11T18:38:54.119538Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:49.020038Z - 2017-04-11T18:38:59.320038Z | 40.0 Hz, 413 samples
CI.RIO..BHE | 2017-04-11T18:38:54.594538Z - 2017-04-11T18:38:59.719538Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:59.345038Z - 2017-04-11T18:39:04.420038Z | 40.0 Hz, 204 samples
CI.RIO..BHE | 2017-04-11T18:38:59.744538Z - 2017-04-11T23:59:59.994538Z | 40.0 Hz, 770411 samples

In [4]: st.merge()                                                                                   
Out[4]: 
1 Trace(s) in Stream:
CI.RIO..BHE | 2017-04-11T00:00:00.019500Z - 2017-04-11T23:59:59.994500Z | 40.0 Hz, 3456000 samples

This seems like a pretty rare problem. This is the only day in 20 years of data for this one station I've had an issue with after the latest update.

jpjones76 commented 4 years ago

Split from Issue #28.

ungap! appears unable to resolve negative time gaps less than one sample long. I don't know if I should consider this a bug or an oversight on my part. I recoded gapfill! recently, which ungap! calls, acting on the assumption that negative time gaps get fixed by merge!; but I implicitly wrote merge! so that negative subsample gaps persist. Not my smartest coding decision.

I'm going to modify gapfill! to deal with this.

A fix won't make the outputs of SeisIO and ObsPy identical. In your example, the outputs of ObsPy and SeisIO will only match if each pair of samples at an overlapping time comprises two identical values.

A bit of discussion follows.

Merge Behavior and Seismic Software

No "correct" solution exists for different data samples at overlapping times. Each program has its' own method(s) of handling this problem.

SAC

overwrites.

ObsPy

allows one to specify how to handle overlapping samples with a keyword: (https://docs.obspy.org/packages/autogen/obspy.core.trace.Trace.__add__.html#obspy.core.trace.Trace.__add__)

SeisIO

averages pointwise pairs in the absence of NaNs after checking for time shifts.

The "Time-Shifted Merge" Problem

There's an annoyingly common situation in which two trace files contain identical samples that are time-shifted by a few sample intervals. For example, file trace1.sac ends at 12:00:00; file trace2.sac starts at 11:59:59.99; both sample at 100 Hz; yet the first two samples of trace2.sac are identical to the last two samples of trace1.sac.

jpjones76 commented 4 years ago

I'm testing a robust fix now. The file you included actually has three data problems:

  1. Packet ordering creates time segments that aren't chronological
  2. Negative time jumps
  3. Time jumps have a "remainder" more than half the sampling interval

The first two problems have been handled by merge since I wrote it, but the "remainder" in the negative time jump causes the problem.

tclements commented 4 years ago

Great! Happy to test this new method on my large set of corrupted files.

jpjones76 commented 4 years ago

This issue appears resolved by my tests. It might be worth reflecting on the necessity of the data set if corrupted files continue to cause this much trouble; this issue took around 20 hours of recoding and testing before I found a solution that didn't break my merge! tests. At some point data become too corrupt to trust, which I fear any weirder errors will suggest. Still, please keep me posted as your current work develops.