jpjones76 / SeisIO.jl

Julia language support for geophysical time series data
http://seisio.readthedocs.org
Other
47 stars 20 forks source link

Avoid duplication in request body. #15

Closed kura-okubo closed 5 years ago

kura-okubo commented 5 years ago

Hello jpjones,

I tried downloading data from New Zealand data server GEONET, and I found duplications in data query strings as below.


julia> get_data("FDSN", "NZ.NAAS.*.BN?", s="2016-05-20T01:14:25.07", t="2016-05-20T01:24:25.07", v=2, src="GEONET")
[ Info: 2019-06-10T22:12:23.160: Querying FDSN stations
Most compact request form = ["NZ" "NAAS" "*" "BN?" ""]
request url:http://service.geonet.org.nz/fdsnws/station/1/query
request body:
level=response
format=xml
 NZ NAAS * BN? 2016-05-20T01:14:25.07 2016-05-20T01:24:25.07

[ Info: 2019-06-10T22:12:24.644: Building list of channels
data query strings:
NZ NAAS 20 BN2
NZ NAAS 20 BN1
NZ NAAS 20 BNZ
NZ NAAS 20 BN2
NZ NAAS 20 BN1
NZ NAAS 20 BNZ
[ Info: 2019-06-10T22:12:24.652: Data query begins
request url: http://service.geonet.org.nz/fdsnws/dataselect/1/query
request body:
format=miniseed
NZ NAAS 20 BN2 2016-05-20T01:14:25.070000 2016-05-20T01:24:25.070000
NZ NAAS 20 BN1 2016-05-20T01:14:25.070000 2016-05-20T01:24:25.070000
NZ NAAS 20 BNZ 2016-05-20T01:14:25.070000 2016-05-20T01:24:25.070000
NZ NAAS 20 BN2 2016-05-20T01:14:25.070000 2016-05-20T01:24:25.070000
NZ NAAS 20 BN1 2016-05-20T01:14:25.070000 2016-05-20T01:24:25.070000
NZ NAAS 20 BNZ 2016-05-20T01:14:25.070000 2016-05-20T01:24:25.070000
NZ.NAAS.20.BN2: resized from length 0 to length 360000
NZ.NAAS.20.BN1: resized from length 0 to length 360000
NZ.NAAS.20.BNZ: resized from length 0 to length 360000
[ Info: 2019-06-10T22:12:37.211: Done FDSNget query.
[ Info: 2019-06-10T22:12:37.211: Removing empty channels.
SeisData with 3 channels (3 shown)
    ID: NZ.NAAS.20.BN2                     NZ.NAAS.20.BN1                     NZ.NAAS.20.BNZ
  NAME: Napier Airport                     Napier Airport                     Napier Airport
   LOC: -39.4687 N, 176.872 E, 2.0 m       -39.4687 N, 176.872 E, 2.0 m       -39.4687 N, 176.872 E, 2.0 m
    FS: 50.0                               50.0                               50.0
  GAIN: 1.01972e5                          1.01972e5                          1.01972e5
  RESP: c = 1.0, 0 zeros, 0 poles          c = 1.0, 0 zeros, 0 poles          c = 1.0, 0 zeros, 0 poles
 UNITS: m/s2                               m/s2                               m/s2
   SRC: http://service.geonet.org.nz/fdsn… http://service.geonet.org.nz/fdsn… http://service.geonet.org.nz/fdsn…
  MISC: 2 entries                          2 entries                          2 entries
 NOTES: 0 entries                          0 entries                          0 entries
     T: 2016-05-20T01:14:24.034 (0 gaps)   2016-05-20T01:14:20.871 (0 gaps)   2016-05-20T01:14:24.515 (0 gaps)
     X: -2.057e+03                         -2.413e+03                         -3.780e+02
        -2.057e+03                         -2.415e+03                         -3.710e+02
            ...                                ...                                ...
        -2.036e+03                         -2.375e+03                         -3.520e+02
        (nx = 61112)                       (nx = 60702)                       (nx = 60444)
     C: 0 open, 0 total

Julia>

I guess this is due to the duplication in StationXML downloaded from GEONET. This duplication causes an error when using lat-lon box request (due to limitation of request number of channels) as the number of request becomes much larger than actual number.

In addition, it sometimes causes an error as below:

┌ Warning: Error thrown:
│ URL: http://service.geonet.org.nz/fdsnws/dataselect/1/query
│ POST BODY:
│ format=miniseed
│ NZ CCCC 20 BN1 2018-05-20T00:42:04.570000 2018-05-20T00:52:04.570000
│ NZ CCCC 20 BN2 2018-05-20T00:42:04.570000 2018-05-20T00:52:04.570000
│ NZ CCCC 20 BNZ 2018-05-20T00:42:04.570000 2018-05-20T00:52:04.570000
│ NZ CCCC 20 BNZ 2018-05-20T00:42:04.570000 2018-05-20T00:52:04.570000
│ NZ CCCC 20 BN2 2018-05-20T00:42:04.570000 2018-05-20T00:52:04.570000
│ NZ CCCC 20 BN1 2018-05-20T00:42:04.570000 2018-05-20T00:52:04.570000
│
│ ERROR TYPE: HTTP.IOExtras.IOError
└ @ SeisIO ~/.julia/packages/SeisIO/mMCC6/src/Web/0_essentials.jl:58

So I would like to ask to modify get_data function to avoid this duplicated request.

Best,

jpjones76 commented 5 years ago

Thank you for the report. I'm looking into this now.

jpjones76 commented 5 years ago

OK, I found the problem. The bad news is that it's definitely an issue with GeoNet. Marine replicated this bug in ObsPy earlier today. The good news is that I know how to fix it. Amazingly enough, my blind guess about the cause is correct:

When GeoNet changes a channel's parameters, they record a startDate attribute for the new XML element, but there's no endDate attribute added to the old element. However, it seems that a channel element with noendDate is considered valid in the time range -∞:+∞; for example, your query for 2016 returns some channel elements with a startDate of November 2017. I did an identical query through their webpage and got exactly the same results.

Workaround: I can add a control loop to SeisIO that retains one unique entry per channel, based on startDate . This might be messy because I need to test each channel ID for uniqueness, then loop over each group of IDs to create an array of endDate values, then retain the element that's correct for the query window.

Do you know anyone at GeoNet? Could you encourage them to add endDate values to their station XML? I ask because it's easy to imagine a "use case" where this breaks research: suppose a program reads station XML until the first match of each channel. That's OK for normal station XML, but would yield Geonet parameters that are outdated and therefore wrong. Now suppose one's research requires correcting to true ground velocity, and the "wrong" parameters include a gain...

(I thought of this because I encountered a very similar "use case" with Win32 data in 2016: JMA, Nagoya University, and HiNet each had their own parameter file for the two JMA stations on Mt. Ontake. No two parameter files agreed. The gain of each seismic channel varied from file to file by ~50%; the gain of each infrasound channel varied by 3-4 orders of magnitude. No one knew which parameters were current.)

I'll add a fix to SeisIO in a few days. At the moment I'm trying to learn why the Julia ecosystem didn't update SeisIO to v0.3.0.

jpjones76 commented 5 years ago

Hi, I implemented a rewrite of FDSN_sta_xml tonight that should include a very clean workaround for this problem. Are you still having this issue, or is it now fixed?

kura-okubo commented 5 years ago

Hello,

Thank you for updating the module. I will retry the downloading in a couple of days and will reply to you.

Best regards,

On Jul 17, 2019, at 4:03 AM, Joshua Jones notifications@github.com wrote:

Hi, I implemented a rewrite of FDSN_sta_xml tonight that should include a very clean workaround for this problem. Are you still having this issue, or is it now fixed?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jpjones76/SeisIO.jl/issues/15?email_source=notifications&email_token=AE2LIGKZKZFRUQMIXN6FYJLP73G4LA5CNFSM4HWZDXRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2DMG5Q#issuecomment-512148342, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2LIGK6JJMLDERMO6Z4BNLP73G4LANCNFSM4HWZDXRA.