obspy / obspy

ObsPy: A Python Toolbox for seismology/seismological observatories.
https://www.obspy.org
Other
1.15k stars 527 forks source link

polarization_analysis: Exceptions in setting up input/output arrays (wrong size etc.) #3034

Open nukenine opened 2 years ago

nukenine commented 2 years ago

Avoid duplicates

Bug Summary

I have been running obspy.signal.polarization.polarization_analysis w/ the Vidale method vidale flag on. I noticed that the resulting timestamp output starts later and ends earlier than the original input stream times.

For example, with a stream that has a sampling_rate of 2kHz and is 30 seconds in duration w/ a delta of 0.0005:

st[0].times() is [0, 0.0005, 0.0010, etc.]

but

timestamp output from polarization_analysis is [0.03, 0.0305, 0.0310, etc.].

So, the resulting timestamp array begins LATER and ends EARLIER than the st[0].times() array. Is this a bug? Or is it an expected/necessary result of the polarization analysis?

My problem is that, I have ~60 minute 3-component file that I am trying to run a polarization analysis on. I loop through the file with st.slide() in 10 second (non-overlapping) window increments, and on each window run polarization_analysis. I append the resulting timestamp array output from each window into a list (ultimately it makes a list of lists, which I flatten outside of the loop). I ultimately want to plot a one for one of the original st[0].data w/ the polarization output azimuth. But, since the azimuth timestamp begins 0.03 seconds and not at 0.0 seconds (like the st[0].data does, AND since each window will start a little later and end a little earlier, there are essentially big skips in time in the resulting timestamp list.

I can't include these data, but here is basically what the command looks like:

for _st_ in st.slide(window_length=10, step=10):    
    result=pa(_st_, 1., 0.5, frqlow=1, frqhigh=20, stime=_st_[0].stats.starttime, 
          etime=_st_[0].stats.endtime, method='vidale', verbose=True)

First window output timestamp: 'timestamp': array([0.03 , 0.0305, 0.031 , ..., 9.9835, 9.984 , 9.9845]) Second window output timestamp: 'timestamp': array([10.03 , 10.0305, 10.031 , ..., 19.9865, 19.987 , 19.9875])

See the skip from the last element of the 1st array to the first element of the 2nd array? It's greater than the delta which is 0.0005.

I tried to reproduce using the embedded 3C data in obspy, but it's erroring out (not sure why either)..in the reproduction I didn't try to slide through the stream because I just wanted to see if the timestamp was shifted on just one result of the polarization analysis, but it won't run.

Code to Reproduce

from obspy import read
from obspy.signal.polarization import polarization_analysis as pa

st=read()

chans=['GHZ', 'GHN', 'GHE']

st[0].stats.channel=chans[0]
st[1].stats.channel=chans[1]
st[2].stats.channel=chans[2]

stc=st.copy()

stc.trim(starttime=st[0].stats.starttime, endtime=st[0].stats.starttime+10)
print(stc)

result=pa(stc, 2, 0.5, frqlow=1, frqhigh=20, 
         stime=stc[0].stats.starttime, etime=stc[0].stats.endtime, method='vidale', verbose=True)

Error Traceback

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18852/2551111523.py in <module>
      4 print(stc)
      5 
----> 6 result=pa(stc, 2, 0.5, frqlow=1, frqhigh=20, 
      7          stime=stc[0].stats.starttime, etime=stc[0].stats.endtime, method='vidale', verbose=True)

~\Anaconda3\lib\site-packages\obspy\signal\polarization.py in polarization_analysis(stream, win_len, win_frac, frqlow, frqhigh, stime, etime, verbose, method, var_noise, adaptive)
    506     spoint, _epoint = _get_s_point(stream, stime, etime)
    507     if method.lower() == "vidale":
--> 508         res = vidale_adapt(stream, var_noise, fs, frqlow, frqhigh, spoint,
    509                            stime, etime)
    510     else:

~\Anaconda3\lib\site-packages\obspy\signal\polarization.py in vidale_adapt(stream, noise_thres, fs, flow, fhigh, spoint, stime, etime, adaptive)
    270         xx = np.zeros((3, mask.sum()), dtype=np.complex128)
    271         # East
--> 272         xx[0, :] = ea
    273         # North
    274         xx[1, :] = na

ValueError: could not broadcast input array from shape (1001,) into shape (1000,)

ObsPy Version?

ObsPy 1.3.0

Operating System?

Windows 10

Python Version?

3.9.7

Installation Method?

conda

nukenine commented 2 years ago

Update: I'm unsure if there's an issue with the vidale flag specifically. When I run the Code to Reproduce above it shows that traceback error, but if I then print(result) it does in fact show results, everything except ellipticity. Even stranger, if I change vidale to flinn or pm, it runs fine. So, I'm wondering if it's the vidale method that is specifically causing the error. It looks like all three still trim the timestamps at each end, but for some reason, the vidale has this other error when I try to run with the example or even my own data.

megies commented 2 years ago

So, the resulting timestamp array begins LATER and ends EARLIER than the st[0].times() array. Is this a bug? Or is it an expected/necessary result of the polarization analysis?

Without looking at the code, my guess would be the timestamps are calculated as the center of the sliding window.

Second window output timestamp

Not sure what you mean with "second window", but if you mean that you work across file boundaries and loop over multiple streams, you will have a step since this is working on sliding windows and unless you do some extra coding efforts (i don't know the internals of slide() you will miss out on some window(s) spanning the stream boundaries

ValueError: could not broadcast input array from shape (1001,) into shape (1000,)

This shouldn't happen tho, somebody should have a look.

from obspy import read
from obspy.signal.polarization import polarization_analysis as pa

st = read()
st.trim(starttime=st[0].stats.starttime, endtime=st[0].stats.starttime+10)

pa(st, 2, 0.5, frqlow=1, frqhigh=20, stime=st[0].stats.starttime,
   etime=st[0].stats.endtime, method='vidale', verbose=True)
nukenine commented 2 years ago

Without looking at the code, my guess would be the timestamps are calculated as the center of the sliding window.

OK that makes sense actually,

Not sure what you mean with "second window", but if you mean that you work across file boundaries and loop over multiple streams, you will have a step since this is working on sliding windows and unless you do some extra coding efforts (i don't know the internals of slide() you will miss out on some window(s) spanning the stream boundaries

Sorry did not explain myself clearly. I am looping through one stream with the Stream.slide() method. It appears this creates many windows depending on the window_length parameter. On each window, I run the polarization_analysis method. So, since it may be computing timestamps on the center of a sliding window within polarization_analysis, it makes sense to me that the beginning and end of each window (of the sliding stream) are cut slightly shorter (in terms of polarization_analysis output timestamp, azimuth, etc.) while looping through the stream.

This shouldn't happen tho, somebody should have a look.

Yes, not sure why it's happening. And, I just ran the vidale method on another stream (with my actual data), which worked, but when I tried to run flinn and pm method it gave me this output:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_10676/98201681.py in <module>
     19     print(f'\nStart Vidale Polarization Analysis')
     20 
---> 21     pol_an=pa(win, wleng, 0.5, frqlow=100, frqhigh=150, stime=win[0].stats.starttime, 
     22               etime=win[0].stats.endtime, method='pm', verbose=False) #run Vidale polarization; bandpassed 100-150 Hz
     23 

~\Anaconda3\lib\site-packages\obspy\signal\polarization.py in polarization_analysis(stream, win_len, win_frac, frqlow, frqhigh, stime, etime, verbose, method, var_noise, adaptive)
    553     res = np.array(res)
    554 
--> 555     result_dict = {"timestamp": res[:, 0],
    556                    "azimuth": res[:, 1],
    557                    "incidence": res[:, 2]}

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
megies commented 2 years ago

I edited the ticket title, I think that initial question was solved, right? So we can keep this ticket to track these bugs you dug up now :smiling_face_with_tear:

nukenine commented 2 years ago

@megies Thank you for your help! Yes, and thanks for answering the original issue. I am still very much interested in trying to help solve the second issue, as it seems to be failing for not only the example data, but my actual data as well. For some reason, the flinn and pm flags will work sometimes, but not the vidale, and vice versa. Strange. Maybe it is me, but, you did see the vidale fail with the st.read() library stream. Let me know if there's anything else I could try running to better show the issue.

megies commented 2 years ago

Might not have time to look into this.. but we also some other tickets still unresolved on the topic, see https://github.com/obspy/obspy/issues?q=vidale

megies commented 2 years ago

@nukenine you seem to get output timestamps for every single sample.. did you specify that explicitly? Seems unusual to me and maybe thats also why we had reports of the calculations not finishing (https://discourse.obspy.org/t/polarization-examples/1388/8) and some weird behavior of output length reported in #2441..?

nukenine commented 2 years ago

@megies

you seem to get output timestamps for every single sample.. did you specify that explicitly?

I'm not sure, I call the method like this:

result=pa(stc, 2, 0.5, frqlow=1, frqhigh=20, 
         stime=stc[0].stats.starttime, etime=stc[0].stats.endtime, method='vidale', verbose=True)

I want to output timestamps for every single sample, as I want to be able to relate to the original amplitude traces to the az, inc, etc. In fact, since it is computing at the center of the sliding window (as you suggest) it's actually truncating outputs at the beginning and end of the stream (by milliseconds). It looks like @ThomasLecocq confirms there may be a truncation/buffer in #2441

maybe thats also why we had reports of the calculations not finishing

So, that's actually my discourse question that I opened, and after reviewing #2441 it looks like the vidale method specifically does not use the win_len and win_frac arguments:

if method.lower() == "vidale":
        res = vidale_adapt(stream, var_noise, fs, frqlow, frqhigh, spoint,
                           stime, etime)

This indicates to me that there is no sliding window in obspy's implementation of the vidale method, unless I am mistaken? This could explain why when you try to run polarization_analysis with the vidale method on longer duration streams that it takes forever/times out, since it's not being windowed (this happens to me with anything over 30 seconds duration really). And, this is why I started calling the polarization_analysis method within a for window in st.slide(): ... do polarization .... loop, so that I window the data and compute polarization_analysis on each window, which allows it to not time out and actually run normally.

But, this raised the original issue, that since the polarization_analysis method is computing on center times (maybe), then by windowing, it is leaving gaps in timestamps at the beginning and end of each window when I append output results of each window to a container list so that I can compare the entire output after the loop. Does that make sense?

Edit: Tagging @jwassermann since I saw his name as a developer on one of these issues

Thanks!