jpjones76 / SeisIO.jl

Julia language support for geophysical time series data
http://seisio.readthedocs.org
Other
47 stars 21 forks source link

filtfilt! converts all data to NaN #82

Closed dttrugman closed 3 years ago

dttrugman commented 3 years ago

I am new to SeisIO, so apologies if I am doing something stupid.

I want to implement a simple workflow where I (1) download data and (2) do some simple preprocessing, including a bandpass filter. When I try to filter the data, however, the entire time series appears to be corrupted. For example:

using SeisIO tstart = "2020-03-26T15:00:00" tend = "2020-03-26T15:30:00" CHA = "TX.PECS.00.HHZ" S = get_data("FDSN",CHA,src="IRIS",s=tstart,t=tend,) println(S) println("min:", minimum(S[1].x)," nan: ",sum(isnan.(S[1].x))) ungap!(S) # remove gaps println("min:", minimum(S[1].x)," nan: ",sum(isnan.(S[1].x))) detrend!(S) # remove mean and trend println("min:", minimum(S[1].x)," nan: ",sum(isnan.(S[1].x))) taper!(S) # taper ends println("min:", minimum(S[1].x)," nan: ",sum(isnan.(S[1].x))) filtfilt!(S,fl=1.0,fh=10.0) println("min:", minimum(S[1].x)," nan: ",sum(isnan.(S[1].x)))

Gives the following response:

SeisData with 1 channels (1 shown) ID: TX.PECS.00.HHZ
NAME: Pecos
LOC: 31.3704 N, -103.867 E, 914.0 m
FS: 200.0
GAIN: 4.77413e8
RESP: a0 1.0, f0 1.0, 6z, 11p
UNITS: m/s
SRC: http://service.iris.edu/fdsnws/da… MISC: 4 entries
NOTES: 4 entries
T: 2020-03-26T15:00:00 (0 gaps)
X: -1.010e+03
-1.009e+03
...
-1.410e+03
(nx = 360001)
C: 0 open, 0 total min:-550105.0 nan: 0 min:-550105.0 nan: 0 min:-549159.7 nan: 0 min:-549159.7 nan: 0 min:NaN nan: 360001

Any idea what I am doing wrong? I am running julia version 1.5.2 if that helps.

dttrugman commented 3 years ago

As a follow up, I was able to filter without issuing using DSP.jl directly, e.g.:

xx0 = copy(S[1].x) responsetype = Bandpass(1.0,10.0,fs=200.0) designmethod = Butterworth(4) myfilter = digitalfilter(responsetype,designmethod) xx1 = filtfilt(myfilter,xx0)

So it appears to be an issue with SeisIO and not that library.

dttrugman commented 3 years ago

Any update on this? (Not to be pushy, but it's kind of a serious bug that makes SeisIO more or less unusable). In experiments, I've noticed that it consistently happens for higher sample rate data, though I'm not sure why.

jpjones76 commented 3 years ago

I've looked into your report but I see no bug. This can be fixed by converting your data to Float64 precision.

using SeisIO
tstart = "2020-03-26T15:00:00"
tend = "2020-03-26T15:30:00"
CHA = "TX.PECS.00.HHZ"
S = get_data("FDSN",CHA,src="IRIS",s=tstart,t=tend,)
S.x[1] = Float64.(S.x[1]); # added by Josh 2021-03-19
ungap!(S) # remove gaps
detrend!(S) # remove mean and trend
taper!(S) # taper ends
filtfilt!(S,fl=1.0,fh=10.0)
println("min:", minimum(S[1].x)," nan: ",sum(isnan.(S[1].x))) # no NaNs

Detail

SeisIO.filtfilt! calls DSP.jl functions in all steps of the filtering process. There are, in fact, exactly two differences from DSP.filtfilt. The irrelevant difference is an optimization for memory conservation on multi-channel objects. The relevant difference is that DSP.filtfilt converts the input data to Float64 precision.

At Float32 precision, your filter choice creates coefficient vectors b, a whose ratios yield denominator values below eps(Float32) in the calls to filt!. NaNs are introduced in that step. The same filt! call in DSP.filtfilt works because a, b, z, X, and Y are Float64.

Incidentally, in Jones et al. (2020), we implicitly demonstrate that our filtering works correctly in Fig. 5b and the corresponding text. So I really don't understand your decision to denounce a published work as "unusable".

Resolution

I'm treating this as an implicit feature request for an automated way to prevent NaNs when calling filtfilt! on Float32 data. There may be a straightforward workaround. Unfortunately, for the scientific questions that SeisIO users are investigating, DSP.jl's forced Float64 conversion won't work: that causes more issues than it prevents.

dttrugman commented 3 years ago

Hi, thanks for your response. I just didn't know the data had to be Float64 before using filtfilt!

While it would be great for this to work on Float32, my "feature request" would really just be to provide some guidance and perhaps a worked example of how to do this properly in the Documentation and the Tutorial. I looked through both rather carefully and had no clue.... (I also noticed a similar problem in the Jupyter Notebooks when I ran them). One might also think about throwing an error if filtfilt! is applied to Float32?

In any case, I didn't mean at all to be disparaging about the software package. I am very excited to use it in my everyday workflow; I just need to be able to do simple things like filtering before making the conversion and having the confidence to make the switch. I asked around before posting this, and I am not the only user that has run into this issue and was puzzled by it.

Sorry to offend,

Daniel

jpjones76 commented 3 years ago

As I explained in my last comment, the user's choice of filter parameters is responsible for outputting NaNs. It's not a mistake for SeisIO to use Float32 data. Using Float32 data, rather than Float64, has exactly one effect: it narrows the range of viable filtering parameters.

To demonstrate this, suppose I modify your above example as follows:

using SeisIO
fl = 1.0e-7; fh = 1.0; tstart = "2020-03-26T15:00:00"; tend = "2020-03-26T15:30:00"; CHA = "TX.PECS.00.HHZ"
S = get_data("FDSN", CHA,src="IRIS", s=tstart, t=tend)
S.x[1] = Float64.(S.x[1])
ungap!(S)
detrend!(S)
taper!(S)
X = deepcopy(S.x[1])
filtfilt!(S, fl=fl, fh=fh)
responsetype = Bandpass(fl, fh, fs=S.fs[1])
designmethod = Butterworth(4)
myfilter = digitalfilter(responsetype, designmethod)
X1 = filtfilt(myfilter, X)

Here, X1 contains all 64-bit NaNs. So does S.x[1].

A better solution for your data is to downsample. You're band passing with an upper corner frequency of 10.0 Hz. If you check the PSD estimate of the filter frequency response function, you'll find almost no signal energy above 12 Hz. Suppose we resample to 40.0 Hz, then:

using SeisIO
fl = 1.0e-3; fh = 10.0; tstart = "2020-03-26T15:00:00"; tend = "2020-03-26T15:30:00"; CHA = "TX.PECS.00.HHZ"
S = get_data("FDSN", CHA,src="IRIS", s=tstart, t=tend)
S.x[1] = Float64.(S.x[1])
ungap!(S)
detrend!(S)
taper!(S)
resample!(S, fs=40.0)
filtfilt!(S, fl=fl, fh=fh)
findall(isnan.(S.x[1]))

No NaNs.

dttrugman commented 3 years ago

Got it, thanks. I agree that is the better solution in this case, though I initially came across the issue when removing the instrument response for high-sample rate data where downsampling beforehand would not have been an option.

In any case, I do think it would be useful to clarify this nuance in the documentation - I am not the first person to make this mistake, nor will I be the last. It is not an issue in other codes commonly used to process seismic data like SAC, ObsPy, etc., so is not obvious to people making the switch. (And correct me if I am wrong, but I saw the same issue in the filtering example notebooks?).

But it's your code, so it's really your call. Just giving my two cents.

kura-okubo commented 3 years ago

Hello,

Thank you for the great discussion here. It is very helpful for me to implement filtering with SeisIO. I have checked the script on the Fig. 5b in Jones et al. (2020) for the validation of instrument response removal. We used SeisIO.filtfilt!() before translating the instrument response (line 41 of the test script). We obtained the results without NaNs using the second order Butterworth filter in the article.

To further check the issue, I have conducted the bandpass with forth order Butterworth, and found the results with NaNs when datatype is Float32 as discussed in this issue.

using SeisIO

# get data
tstart = "2018-03-01T00:00:00.000"
tend = "2018-03-02T00:00:00.000"
CHA = "TA.121A.--.HHZ"

S = get_data("FDSN",CHA,src="IRIS",s=tstart,t=tend, xf="TA.121A.xml")
println(S)
println(typeof(S[1].x))

freqmin = 0.05
freqmax = 19.9

# 2nd order Butterworth used in Jones et al. (2020) Figure 5b
S1 = deepcopy(S)
np = 2
filtfilt!(S1,fl=freqmin,fh=freqmax,np=np,rt="Bandpass")
println("2nd order with Float32 min:", minimum(S1[1].x)," nan: ",sum(isnan.(S1[1].x)))

# 2nd order Butterworth with Float64
S1 = deepcopy(S)
np = 2
S1.x[1] = Float64.(S1.x[1])
filtfilt!(S1,fl=freqmin,fh=freqmax,np=np,rt="Bandpass")
println("2nd order with Float64 min:", minimum(S1[1].x)," nan: ",sum(isnan.(S1[1].x)))

# 4th order Butterworth with Float32
S1 = deepcopy(S)
np = 4
filtfilt!(S1,fl=freqmin,fh=freqmax,np=np,rt="Bandpass")
println("4th order with Float32 min:", minimum(S1[1].x)," nan: ",sum(isnan.(S1[1].x)))

# 4th order Butterworth with Float64
S1 = deepcopy(S)
np = 4
S1.x[1] = Float64.(S1.x[1])
filtfilt!(S1,fl=freqmin,fh=freqmax,np=np,rt="Bandpass")
println("4th order with Float64 min:", minimum(S1[1].x)," nan: ",sum(isnan.(S1[1].x)))

Output:

SeisData with 1 channels (1 shown)
    ID: TA.121A..HHZ
  NAME: Cookes Peak, Deming, NM, USA
   LOC: 32.5324 N, -107.785 E, 1652.0 m
    FS: 100.0
  GAIN: 6.27604e8
  RESP: a0 3.48233f17, f0 0.4, 6z, 11p
 UNITS: m/s
   SRC: http://service.iris.edu/fdsnws/da…
  MISC: 4 entries
 NOTES: 4 entries
     T: 2018-03-01T00:00:00 (0 gaps)
     X: +8.070e+02
        +8.580e+02
            ...
        +9.880e+02
        (nx = 8640001)
     C: 0 open, 0 total
Array{Float32,1}
2nd order with Float32 min:-3507.7124 nan: 0
2nd order with Float64 min:-3507.8948733714124 nan: 0
4th order with Float32 min:NaN nan: 8640001
4th order with Float64 min:-3740.9291867467587 nan: 0

Thank you for the commit (Issue #82) on the documentation and the filtfilt!() function, which helps me understand better on the data type when applying the filter.

Best,

jpjones76 commented 3 years ago

Got it, thanks. I agree that is the better solution in this case, though I initially came across the issue when removing the instrument response for high-sample rate data where downsampling beforehand would not have been an option.

Please provide a minimum working example. I am sincerely sorry that you're finding SeisIO frustrating, but merely claiming that you've encountered an error in another place is not enough information to act on.

It is not an issue in other codes commonly used to process seismic data like SAC, ObsPy, etc.

https://github.com/obspy/obspy/issues/2529

so is not obvious to people making the switch. (And correct me if I am wrong, but I saw the same issue in the filtering example notebooks?).

You can verify what I wrote in my last reply by checking the closed Issues: no one has ever reported NaNs in the tutorial notebooks before. In fact, the only tutorial-specific problem at all was Issue #76, due to a minor typo in the documentation.

But it's your code, so it's really your call. Just giving my two cents.

Thank you for your feedback. I added two commits yesterday, one of which resolves the example in your initial comment. I plan to do a release with those commits today (22 March).

dttrugman commented 3 years ago

Perfect, thanks for the commits. I think that will help everyone in the long run. The example I posted above was my attempt at a "minimum working example" because the place where I initially spotted the error was in the middle of a long complex workflow with data that is not publicly available.

In terms of the tutorial, you may be on it already with your commit, but the place I am referring to is in the notebook called Part4-Processing, Cell #9: Sf = remove_resp(S, chans=4:S.n)

If we print Sf.x, we get: 4-element Array{Union{AbstractArray{Float32,1}, AbstractArray{Float64,1}},1}: Float32[2.7664585f6, 2.308987f6, 1.9008121f6, 1.7666095f6, 1.8152758f6, 1.8575735f6, 1.7005322f6, 1.3385941f6, 846242.1, 320249.0 … -215298.5, -502047.0, -700261.25, -781168.4, -807831.6, -849426.06, -916592.8, -449656.7, 19702.234, 30941.07] Float32[29794.031, -14405.785, -47402.758, -35956.305, 2092.416, 22985.586, 1762.7256, -27433.664, -11251.711, 61675.17 … 18562.438, 80717.66, 90282.016, 69408.92, 54823.605, 54827.85, 56621.652, 30729.158, 6788.757, 6151.325] Float32[NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN … NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN] Float32[NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN … NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN]

I know this hasn't been brought up in the forums, as I did a search before opening a new issue. But I also reached out to two colleagues who have experience with SeisIO. Both had run across the NaN issue before, neither knew why it was happening, and one of the two had stopped using the code because of it. All three of us have a non-trivial amount of experience in observational seismology, though clearly don't have as much expertise in the underlying algorithms as you do.

As you point out, it is not a "bug" but a misunderstanding of the effect that using Float32 data types can have in certain circumstances. So as long as new users are aware, I don't think this is a major issue. (As an aside, the obspy forum you mentioned appears unrelated to filtering / instrument response, though perhaps it is related in some deeper way that escapes me).

Thanks again for all your hard work on the software package. I really look forward to using it more.

Daniel

jpjones76 commented 3 years ago

With apologies, I'm not seeing NaNs there at all.

image

It's possible that IRIS had no data for some channels, or other data issues, when you executed those test commands. In creating that part of the tutorial, we tried to choose test stations that we knew had excellent uptimes, but no station is 100%.

It's also possible that my commits from March 20th fixed it.

I can try to add some code to the tutorial that checks for station and data issues.

dttrugman commented 3 years ago

Hi,

I looked into this again. I think you are right, it is a data availability issue. I re-downloaded and re-installed. Now when running the tutorial, I get the following:

Screen Shot 2021-03-22 at 5 48 58 PM

Then I sat five minutes and ran again:

Screen Shot 2021-03-22 at 5 49 28 PM

So the NaNs go in and out depending on the time of the day. Likely unrelated to the filtering issue, but useful to know about.

Daniel

jpjones76 commented 3 years ago

Perfect. I think it should be relatively straightforward to check for that in the tutorial.

jpjones76 commented 3 years ago

Patch-level release (v1.2.1) completed. Includes the filtfilt! changes, various tutorial updates (some related to this), and an unrelated SAC problem. All automated checks have passed, so it should be in the Julia package registry tomorrow.

dttrugman commented 3 years ago

Great, download and installed, and all tests passed of course.