read_data: problem with negative time gaps in mini-SEED

tclements commented 4 years ago

I have set of somewhat corrupted day-long miniseed files (BH?) that have two channels each. The first channel is the first few seconds of the day (~100 samples). The second channel is the entire daylong trace (3456000 samples). So the first few seconds are repeated across the two channels. When using read_data, the first channel is included with the second channel giving a single channel of 3456080 samples.

S = read_data("mseed","CIRIO__BHZ___2012132.mseed")
SeisData with 1 channels (1 shown)
    ID: CI.RIO..BHZ                        
  NAME: CI.RIO..BHZ                        
   LOC: 0.0 N, 0.0 E, 0.0 m                
    FS: 40.0                               
  GAIN: 1.0                                
  RESP: a0 1.0, f0 1.0, 0z, 0p             
 UNITS:                                    
   SRC:                                    
  MISC: 0 entries                          
 NOTES: 0 entries                          
     T: 2012-05-11T00:00:00.020 (0 gaps)   
     X: -1.590e+03                         
        -2.958e+03                         
            ...                            
        -1.521e+03                         
        (nx = 3456080)                     
     C: 0 open, 0 total
all(S[1].x[1:80] .== S[1].x[81:160])
true

The first 80 samples are definitely repeated. I expect to get two channels when using get_data as Obspy.read gives:

import obspy 
obspy.read("CIRIO__BHZ__2012132.mseed")
2 Trace(s) in Stream:
CI.RIO..BHZ | 2012-05-11T00:00:00.019500Z - 2012-05-11T00:00:01.994500Z | 40.0 Hz, 80 samples
CI.RIO..BHZ | 2012-05-11T00:00:00.019500Z - 2012-05-11T23:59:59.994500Z | 40.0 Hz, 3456000 samples

I tried to change the nx_new KW to a smaller number but that didn't give me two channels. Here is a link to an example miniseed file.

jpjones76 commented 4 years ago

Ok. I have two questions for you:

Do the two channels have exactly the same ID, as the ObsPy stream suggests?
Is the desired behavior to read the SEED volume into two separate channels? (one with 80 samples, one with the rest)

jpjones76 commented 4 years ago

I think I see the problem. I can fix it quickly but the output in SeisIO will still be one channel. You definitely found a real bug; looks like I'm incorrectly handling negative time gaps in mini-SEED. In my defense I've never seen a negative time jump greater than one sample (nor a SEED volume with redundant packets).

I can't produce two channels from that volume without adding a very breakable control loop that starts a new channel whenever a negative time jump gets encountered. That's what ObsPy is doing. I think ObsPy is making a mistake because that's a very breakable condition (imagine every negative time correction starting a new channel). In fact, I think you should also report this as a bug in ObsPy.

I know no "right" way to handle a negative time jump between two SEED packets because I know of no procedure for it in the SEED manual. I'm not even sure that the SEED file spec requires packets to be in chronological order ... or even unique!

tclements commented 4 years ago

This one's new for me too. Outputting one channel is preferable if the data are simply repeated. Beyond that case I'm not sure the best way to merge the gap.

jpjones76 commented 4 years ago

I think I have a bug fix already. Tests pass. Will push live soon.

Workflow warning: you'll need to run merge!(S) after loading. I can't ignore the negative time jump, but if I write it as a negative time gap in :t, then merge! fixes everything, including deleting the redundant data.

Is it OK to add this file to my test data? It's really useful to have a perfect example of a negative time jump.

Console log follows:

julia> S = read_data("/data1/Downloads/CIRIO__BHZ___2012132.mseed.ms")
SeisData with 1 channels (1 shown)
    ID: CI.RIO..BHZ                        
  NAME: CI.RIO..BHZ                        
   LOC: 0.0 N, 0.0 E, 0.0 m                
    FS: 40.0                               
  GAIN: 1.0                                
  RESP: a0 1.0, f0 1.0, 0z, 0p             
 UNITS:                                    
   SRC:                                    
  MISC: 0 entries                          
 NOTES: 0 entries                          
     T: 2012-05-11T00:00:00.020 (1 gaps)   
     X: -1.590e+03                         
        -2.958e+03                         
            ...                            
        -1.521e+03                         
        (nx = 3456080)                     
     C: 0 open, 0 total

julia> X = S.x[1][1:80];

julia> @test X == S.x[1][81:160]
Test Passed

julia> S.t
1-element Array{Array{Int64,2},1}:
 [1 1336694400019500; 81 -2000000; 3456080 0]

julia> merge!(S)

julia> @test X == S.x[1][1:80]
Test Passed

julia> @test X != S.x[1][81:160]
Test Passed

julia> merge!(S)

julia> S.t
1-element Array{Array{Int64,2},1}:
 [1 1336694400019500; 3456000 0]

julia> length(S.x[1])
3456000

tclements commented 4 years ago

Awesome, thank you! Yes, for sure, include it in the test data.

tclements commented 4 years ago

One more particularly devious example. Here's an mseed file that has > 10 of negative time gaps. After merging, which works, ungap throws a bounds error.

S = read_data("mseed","CIRIO__BHE___2017101.mseed")
SeisData with 1 channels (1 shown)
    ID: CI.RIO..BHE                        
  NAME: CI.RIO..BHE                        
   LOC: 0.0 N, 0.0 E, 0.0 m                
    FS: 40.0                               
  GAIN: 1.0                                
  RESP: a0 1.0, f0 1.0, 0z, 0p             
 UNITS:                                    
   SRC:                                    
  MISC: 0 entries                          
 NOTES: 0 entries                          
     T: 2017-04-11T00:00:00.020 (14 gaps)  # note 14 gaps 
     X: -5.600e+01                         
        -6.530e+02                         
            ...                            
        -5.850e+02                         
        (nx = 3457836)                     
     C: 0 open, 0 total

S has 14 gaps. Then merge the seisdata

merge!(S)
S
SeisData with 1 channels (1 shown)
    ID: CI.RIO..BHE                        
  NAME: CI.RIO..BHE                        
   LOC: 0.0 N, 0.0 E, 0.0 m                
    FS: 40.0                               
  GAIN: 1.0                                
  RESP: a0 1.0, f0 1.0, 0z, 0p             
 UNITS:                                    
   SRC:                                    
  MISC: 0 entries                          
 NOTES: 1 entries                          
     T: 2017-04-11T00:00:00.020 (12 gaps)  # note 12 gaps 
     X: -5.600e+01                         
        -6.530e+02                         
            ...                            
        -5.850e+02                         
        (nx = 3456006)                     
     C: 0 open, 0 total

S now has 12 gaps. Now try to ungap

ungap!(S1)
ERROR: BoundsError

which throws a bounds error. For reference, obspy reads 15 overlapping traces but successfully merges and ungaps.

In [1]: import obspy                                                                                 

In [2]: st = obspy.read("CIRIO__BHE___2017101.ms")                                                   

In [3]: st                                                                                           
Out[3]: 
15 Trace(s) in Stream:
CI.RIO..BHE | 2017-04-11T00:00:00.019500Z - 2017-04-11T18:38:23.219500Z | 40.0 Hz, 2684129 samples
CI.RIO..BHE | 2017-04-11T18:38:18.095038Z - 2017-04-11T18:38:28.395038Z | 40.0 Hz, 413 samples
CI.RIO..BHE | 2017-04-11T18:38:23.245038Z - 2017-04-11T18:38:33.520038Z | 40.0 Hz, 412 samples
CI.RIO..BHE | 2017-04-11T18:38:28.420038Z - 2017-04-11T18:38:33.545038Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:33.544538Z - 2017-04-11T18:38:38.669538Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:33.570038Z - 2017-04-11T18:38:38.695038Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:38.694538Z - 2017-04-11T18:38:43.819538Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:38.720038Z - 2017-04-11T18:38:43.845038Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:43.844538Z - 2017-04-11T18:38:48.969538Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:43.870038Z - 2017-04-11T18:38:48.995038Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:48.994538Z - 2017-04-11T18:38:54.119538Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:49.020038Z - 2017-04-11T18:38:59.320038Z | 40.0 Hz, 413 samples
CI.RIO..BHE | 2017-04-11T18:38:54.594538Z - 2017-04-11T18:38:59.719538Z | 40.0 Hz, 206 samples
CI.RIO..BHE | 2017-04-11T18:38:59.345038Z - 2017-04-11T18:39:04.420038Z | 40.0 Hz, 204 samples
CI.RIO..BHE | 2017-04-11T18:38:59.744538Z - 2017-04-11T23:59:59.994538Z | 40.0 Hz, 770411 samples

In [4]: st.merge()                                                                                   
Out[4]: 
1 Trace(s) in Stream:
CI.RIO..BHE | 2017-04-11T00:00:00.019500Z - 2017-04-11T23:59:59.994500Z | 40.0 Hz, 3456000 samples

This seems like a pretty rare problem. This is the only day in 20 years of data for this one station I've had an issue with after the latest update.

jpjones76 commented 4 years ago

Moving this to a new issue since this involves a different set of functions. This might be the weirdest example of trace data that I've ever seen, but it does appear to catch a bug in ungap!.

jpjones76 / SeisIO.jl

read_data: problem with negative time gaps in mini-SEED #28