jpjones76 / SeisIO.jl

Julia language support for geophysical time series data
http://seisio.readthedocs.org
Other
47 stars 21 forks source link

read_data fails on file with fs == 0 #81

Open tclements opened 3 years ago

tclements commented 3 years ago

With SeisIO v1.2.0

(@v1.5) pkg> st
Status `~/.julia/environments/v1.5/Project.toml`
  [b372bb87] SeisIO v1.2.0

I found an mseed file with 113 different unique sampling rates.

using SeisIO
SeisIO.SEED.scan_seed("/home/timclements/CIHEC__BHZ___1999289.ms")

        CHANNEL |       N PTS | N GAPS | N FS 
----------------+-------------+--------+-----
    CI.HEC..BHZ |     1359978 |    151 |  113 

1-element Array{String,1}:
 "CI.HEC..BHZ, nx = 1359978, ngaps = 151, nfs = 113"

When trying to read the file, I get a BoundsError:

S = read_data("mseed","/home/timclements/CIHEC__BHZ___1999289.ms")
ERROR: BoundsError: attempt to access 65535-element Array{Float32,1} at index [0]
Stacktrace:
 [1] getindex at ./array.jl:809 [inlined]
 [2] SEED_Steim!(::IOStream, ::SeisIO.SeisIOBuf, ::UInt16) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Submodules/SEED/1_mSEEDdec.jl:193
 [3] parserec!(::SeisData, ::SeisIO.SeisIOBuf, ::IOStream, ::Int64, ::Int64, ::Bool, ::Int64) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Submodules/SEED/2_parserec.jl:246
 [4] parsemseed!(::SeisData, ::IOStream, ::Int64, ::Int64, ::Bool, ::Int64) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Submodules/SEED/readmseed.jl:11
 [5] read_mseed_file!(::SeisData, ::String, ::Int64, ::Int64, ::Bool, ::Bool, ::Int64) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Submodules/SEED/readmseed.jl:23
 [6] read_data!(::SeisData, ::String, ::String; cf::String, full::Bool, jst::Bool, ll::UInt8, memmap::Bool, nx_add::Int64, nx_new::Int64, strict::Bool, swap::Bool, v::Int64, vl::Bool) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Wrappers/read_data.jl:120
 [7] read_data(::String, ::String; full::Bool, cf::String, jst::Bool, ll::UInt8, memmap::Bool, nx_add::Int64, nx_new::Int64, strict::Bool, swap::Bool, v::Int64, vl::Bool) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Wrappers/read_data.jl:314
 [8] read_data(::String, ::String) at /home/timclements/.julia/packages/SeisIO/tEh5g/src/Wrappers/read_data.jl:313
 [9] top-level scope at REPL[12]:1

Checking this out with obspy shows that there are some data gaps with sampling rate == 0 Hz:

import obspy 
import numpy as np
st = obspy.read("CIHEC__BHZ___1999289.ms")
print(st[39].stats)
         network: CI
         station: HEC
        location: 
         channel: BHZ
       starttime: 1999-10-16T09:46:43.675537Z
         endtime: 1999-10-16T09:46:43.675537Z
   sampling_rate: 0.0
           delta: 0
            npts: 0
           calib: 1.0
         _format: MSEED
           mseed: AttribDict({'dataquality': 'D', 'number_of_records': 1, 'encoding': 'STEIM1', 'byteorder': '>', 'record_length': 512, 'filesize': 2440192})

# find unique sampling rates 
np.unique([tr.stats.sampling_rate for tr in st])
array([  0.,  20.])

# how many samples have fs == 0 
np.sum([tr.stats.sampling_rate == 0. for tr in st])
56

I only found this once while reading a few million files, so this is not a pressing error at all but the file might be good to include in the test set. mseed file attached here: mseed.zip

jpjones76 commented 3 years ago

I'll have to check the file manually to see what's going on with sample rate. If field 11 of the data header is 0.0, and field 10 isn't, I think it's not valid SEED. If field 10 is 0.0, there's supposed to be special handling. I've never coded that case because I've never encountered an example that mixed opaque data with normal data on the same channel. [Per the SEED manual, field 10 = 0.0 implies that the data record contains opaque data, i.e., Blockette 2000.]

It's funny that you should mention it as a one in a million occurrence, in fact: opaque data is so rare that I've only seen it in one test file, which I had to ask Chad Trabant at IRIS to find. Even he couldn't find an example immediately, and the test file he found contained only opaque data for the channel -- so this file is the first case I've heard of where a single channel mixes time-series and opaque data. I actually wonder how the IRIS seed reader handles it...