obspy / obspy

ObsPy: A Python Toolbox for seismology/seismological observatories.
https://www.obspy.org
Other
1.17k stars 536 forks source link

FDSN routing client issues with INGV hosted data #2364

Closed flofux closed 5 years ago

flofux commented 5 years ago

Dear all,

I discovered this behaviour, but I'm not sure if it's an ObsPy issue, or related to the EIDA routing or some webservice issue at INGV. It occurs only when accessing metadata that is hosted at INGV (such as e.g. the IV net, but the behavior is the same for any other net hosted at INGV).

Basically, when not specifying the level of the get_stations request, no stations are returned for any networks hosted at INGV. Only when specifying the level (e.g. level='channel') stations are successfully returned. Specifying stations, locations or channels does not change the behavior. Direct requests without routing (using obspy.clients.fdsn.Client and specifying INGV) and without specifying the level work fine also for any INGV hosted data.

from obspy.clients.fdsn import RoutingClient from obspy import UTCDateTime routed = RoutingClient('eida-routing') t1 = UTCDateTime('2016-01-01T00:00:00') t2 = UTCDateTime('2019-01-01T00:00:00')

The following line will return an empty inventory object: routed.get_stations(network='IV', starttime=t1, endtime=t2)

Only when specifying the level of request, it will work: routed.get_stations(network='IV', starttime=t1, endtime=t2, level='station')

Again, this occurs only when requesting metadata that is hosted at INGV. Any other metadata hosted elsewhere will be successfully returned even when level is not specified. Thus my confusion if it's actually related to ObsPy ...

Best regards, Florian

megies commented 5 years ago

When you use RoutingClient('eida-routing', debug=True) you can see what is sent to the server. If level is not specified there are some header lines in the POST data missing. Looks like the level is not part of these header lines though and they could be included always, so should be an easy fix. No idea what the EIDAWS protocol says about these header lines or what happens when they are not included.

flofux commented 5 years ago

I checked the output with debug=True, but I cannot spot any differences in the requests. The requests to INGV look identical to the ones sent to other datacenters. Only INGV gives back the following error:

Sending along the following payload: IV 2016-01-01T00:00:00 2019-01-01T00:00:00 HTTP error 400, reason Bad Request, while downloading 'http://webservices.ingv.it/fdsnws/station/1/query': Error 400 Bad Request: doIngvProcessing - Unknown query parameters: IV_ * _2016-01-01T00:00:00_2019-01-01T00:00:00 Request: http://webservices.ingv.it/fdsnws/station/1/query Input request (POST): IV 2016-01-01T00:00:00 2019-01-01T00:00:00 Request Submitted: 2019-03-22T11:03:26 UTC Service version: 1.1.40.3

I don't know what doIngvProcessing is doing, but it may be causing the behavior. But this looks more like an INGV problem rather than an ObsPy issue ...

Still, ObsPy could by default add the level='station' option to any request?

megies commented 5 years ago

I cannot spot any differences in the requests

On current master for me using this script:

from obspy.clients.fdsn import RoutingClient
from obspy import UTCDateTime

routed = RoutingClient('eida-routing', debug=True)

start = UTCDateTime('2016-01-01T00:00:00')
end = UTCDateTime('2018-12-31T00:00:00')

kwargs = dict(network='IV', starttime=start, endtime=end)
print(len(routed.get_stations(**kwargs)))
print('#' * 50)
print(len(routed.get_stations(level='station', **kwargs)))

Note the different payload on the nested request issued by the routed client.

Downloading http://www.orfeus-eu.org/eidaws/routing/1/query ...
Sending along the following payload:
----------------------------------------------------------------------
service=station
format=post
alternative=false
IV * * * 2016-01-01T00:00:00.000000 2018-12-31T00:00:00.000000
----------------------------------------------------------------------
Installed new opener with handlers: [<obspy.clients.fdsn.client.CustomRedirectHandler instance at 0x7f6f78782c20>]
Base URL: http://webservices.ingv.it
Request Headers: {u'User-Agent': u'ObsPy/1.1.1.post0+591.g1a7170767f.obspy.master (Linux-4.9.0-6-amd64-x86_64-with-debian-9.7, Python 2.7.14)'}
Downloading http://webservices.ingv.it/fdsnws/dataselect/1/application.wadl with requesting gzip compression
Downloading http://webservices.ingv.it/fdsnws/event/1/application.wadl with requesting gzip compression
Downloading http://webservices.ingv.it/fdsnws/station/1/application.wadl with requesting gzip compression
Downloading http://webservices.ingv.it/fdsnws/event/1/catalogs with requesting gzip compression
Downloading http://webservices.ingv.it/fdsnws/event/1/contributors with requesting gzip compression
Uncompressing gzipped response for http://webservices.ingv.it/fdsnws/station/1/application.wadl
Downloaded http://webservices.ingv.it/fdsnws/station/1/application.wadl with HTTP code: 200
Uncompressing gzipped response for http://webservices.ingv.it/fdsnws/dataselect/1/application.wadl
Downloaded http://webservices.ingv.it/fdsnws/dataselect/1/application.wadl with HTTP code: 200
Uncompressing gzipped response for http://webservices.ingv.it/fdsnws/event/1/application.wadl
Downloaded http://webservices.ingv.it/fdsnws/event/1/application.wadl with HTTP code: 200
Downloaded http://webservices.ingv.it/fdsnws/event/1/contributors with HTTP code: 200
Downloaded http://webservices.ingv.it/fdsnws/event/1/catalogs with HTTP code: 200
Discovered station service
Discovered dataselect service
Discovered event service
Storing discovered services in cache.
Downloading http://webservices.ingv.it/fdsnws/station/1/query with requesting gzip compression
Sending along the following payload:
----------------------------------------------------------------------
IV * * * 2016-01-01T00:00:00 2018-12-31T00:00:00
----------------------------------------------------------------------
HTTP error 400, reason Bad Request, while downloading 'http://webservices.ingv.it/fdsnws/station/1/query': Error 400

Bad Request: 
 doIngvProcessing - Unknown query parameters: IV_*_*_*_2016-01-01T00:00:00_2018-12-31T00:00:00

Request: 
 http://webservices.ingv.it/fdsnws/station/1/query

Input request (POST): 
IV * * * 2016-01-01T00:00:00 2018-12-31T00:00:00

Request Submitted: 
 2019-03-22T15:03:02 UTC

Service version: 
 1.1.40.3
0
##################################################
Downloading http://www.orfeus-eu.org/eidaws/routing/1/query ...
Sending along the following payload:
----------------------------------------------------------------------
service=station
format=post
alternative=false
IV * * * 2016-01-01T00:00:00.000000 2018-12-31T00:00:00.000000
----------------------------------------------------------------------
Installed new opener with handlers: [<obspy.clients.fdsn.client.CustomRedirectHandler instance at 0x7f6f75eda2d8>]
Base URL: http://webservices.ingv.it
Request Headers: {u'User-Agent': u'ObsPy/1.1.1.post0+591.g1a7170767f.obspy.master (Linux-4.9.0-6-amd64-x86_64-with-debian-9.7, Python 2.7.14)'}
Loading discovered services from cache.
Downloading http://webservices.ingv.it/fdsnws/station/1/query with requesting gzip compression
Sending along the following payload:
----------------------------------------------------------------------
level=station
IV * * * 2016-01-01T00:00:00 2018-12-31T00:00:00
----------------------------------------------------------------------
Uncompressing gzipped response for http://webservices.ingv.it/fdsnws/station/1/query
Downloaded http://webservices.ingv.it/fdsnws/station/1/query with HTTP code: 200
1

I don't know what doIngvProcessing is doing, but it may be causing the behavior. But this looks more like an INGV problem rather than an ObsPy issue ...

Still, ObsPy could by default add the level='station' option to any request?

Not sure.. in any case doIngvProcessing is server side, yes. To me it looks light it might be a server side issue, since the level parameter should default to "station" when not given according to FDSNWS specs, even for POST usage, I think? Anyway, yep, we can be explicit on our end and avoid the trouble altogether, not relying on default FDSNWS parameter settings.

petrrr commented 5 years ago

Hi! AFAIK the INGV’s FDSN service also defaults to level station. So that should not be the issue. I’d need to have a closer look on this but I am now on the train so it’s a bit difficult.

Anyway if the service is not standard compliant or there is a server side bug, this will be fixed server side.

~petr

Sent from my iPhone

On 22. Mar 2019, at 16:31, Tobias Megies notifications@github.com wrote:

I cannot spot any differences in the requests

On current master for me using this script:

from obspy.clients.fdsn import RoutingClient from obspy import UTCDateTime

routed = RoutingClient('eida-routing', debug=True)

start = UTCDateTime('2016-01-01T00:00:00') end = UTCDateTime('2018-12-31T00:00:00')

kwargs = dict(network='IV', starttime=start, endtime=end) print(len(routed.get_stations(*kwargs))) print('#' 50) print(len(routed.get_stations(level='station', **kwargs))) Note the different payload on the nested request issued by the routed client.

Downloading http://www.orfeus-eu.org/eidaws/routing/1/query ... Sending along the following payload:

service=station format=post alternative=false IV * 2016-01-01T00:00:00.000000 2018-12-31T00:00:00.000000

Installed new opener with handlers: [<obspy.clients.fdsn.client.CustomRedirectHandler instance at 0x7f6f78782c20>] Base URL: http://webservices.ingv.it Request Headers: {u'User-Agent': u'ObsPy/1.1.1.post0+591.g1a7170767f.obspy.master (Linux-4.9.0-6-amd64-x86_64-with-debian-9.7, Python 2.7.14)'} Downloading http://webservices.ingv.it/fdsnws/dataselect/1/application.wadl with requesting gzip compression Downloading http://webservices.ingv.it/fdsnws/event/1/application.wadl with requesting gzip compression Downloading http://webservices.ingv.it/fdsnws/station/1/application.wadl with requesting gzip compression Downloading http://webservices.ingv.it/fdsnws/event/1/catalogs with requesting gzip compression Downloading http://webservices.ingv.it/fdsnws/event/1/contributors with requesting gzip compression Uncompressing gzipped response for http://webservices.ingv.it/fdsnws/station/1/application.wadl Downloaded http://webservices.ingv.it/fdsnws/station/1/application.wadl with HTTP code: 200 Uncompressing gzipped response for http://webservices.ingv.it/fdsnws/dataselect/1/application.wadl Downloaded http://webservices.ingv.it/fdsnws/dataselect/1/application.wadl with HTTP code: 200 Uncompressing gzipped response for http://webservices.ingv.it/fdsnws/event/1/application.wadl Downloaded http://webservices.ingv.it/fdsnws/event/1/application.wadl with HTTP code: 200 Downloaded http://webservices.ingv.it/fdsnws/event/1/contributors with HTTP code: 200 Downloaded http://webservices.ingv.it/fdsnws/event/1/catalogs with HTTP code: 200 Discovered station service Discovered dataselect service Discovered event service Storing discovered services in cache. Downloading http://webservices.ingv.it/fdsnws/station/1/query with requesting gzip compression Sending along the following payload:

IV * 2016-01-01T00:00:00 2018-12-31T00:00:00

HTTP error 400, reason Bad Request, while downloading 'http://webservices.ingv.it/fdsnws/station/1/query': Error 400

Bad Request: doIngvProcessing - Unknown query parameters: IV*__2016-01-01T00:00:00_2018-12-31T00:00:00

Request: http://webservices.ingv.it/fdsnws/station/1/query

Input request (POST): IV * 2016-01-01T00:00:00 2018-12-31T00:00:00

Request Submitted: 2019-03-22T15:03:02 UTC

Service version: 1.1.40.3 0 ################################################## Downloading http://www.orfeus-eu.org/eidaws/routing/1/query ... Sending along the following payload:

service=station format=post alternative=false IV * 2016-01-01T00:00:00.000000 2018-12-31T00:00:00.000000

Installed new opener with handlers: [<obspy.clients.fdsn.client.CustomRedirectHandler instance at 0x7f6f75eda2d8>] Base URL: http://webservices.ingv.it Request Headers: {u'User-Agent': u'ObsPy/1.1.1.post0+591.g1a7170767f.obspy.master (Linux-4.9.0-6-amd64-x86_64-with-debian-9.7, Python 2.7.14)'} Loading discovered services from cache. Downloading http://webservices.ingv.it/fdsnws/station/1/query with requesting gzip compression Sending along the following payload:

level=station IV * 2016-01-01T00:00:00 2018-12-31T00:00:00

Uncompressing gzipped response for http://webservices.ingv.it/fdsnws/station/1/query Downloaded http://webservices.ingv.it/fdsnws/station/1/query with HTTP code: 200 1 I don't know what doIngvProcessing is doing, but it may be causing the behavior. But this looks more like an INGV problem rather than an ObsPy issue ...

Still, ObsPy could by default add the level='station' option to any request?

Not sure.. in any case doIngvProcessing is server side, yes. To me it looks light it might be a server side issue, since the level parameter should default to "station" when not given according to FDSNWS specs, even for POST usage, I think? Anyway, yep, we can be explicit on our end and avoid the trouble altogether, not relying on default FDSNWS parameter settings.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

TraziBatine commented 5 years ago

Seems like INGV data access works again ...

megies commented 5 years ago

Seems like INGV data access works again ...

What is "working again"? I think the problem still persists.

AFAIK the INGV’s FDSN service also defaults to level station. So that should not be the issue. I’d need to have a closer look on this but I am now on the train so it’s a bit difficult. Anyway if the service is not standard compliant or there is a server side bug, this will be fixed server side.

The station service defaults to level=station when used via REST but it does not default to level=station when used with POST (instead giving this cryptic error message). I am pretty sure this has to be considered a server side issue, but as mentioned before, we could still workaround it from our side if we decide to (by being verbose and setting "level=station" in POST requests for which the user did not select a non-default level).

Actually, I just tested and it's not even the level=.. setting that is the problem but rather any POST request fails that does not have at least one key=value pair at the start.

@petrrr this should be fixed server side, we can leave this ticket open as a reminder but you can close it once it's fixed on the server.

petrrr commented 5 years ago

@megies: Thanks for tracking down this issue to that detail.

Indeed looks like the service implementation wrongly assumes that there is always at least one query parameter in the header. If that is missing, the first line of the is parsed against the key=value pattern and causes the error.

I'll pass this issue to the maintainer of the project and track this issue.

petrrr commented 5 years ago

Actually, I just tested and it's not even the level=.. setting that is the problem but rather any POST request fails that does not have at least one key=value pair at the start.

@megies: I was trying to reproduce the problem, but was not able to do so.

I used our Swagger builder: http://webservices.ingv.it/fdsnws/station/1 as well as curl.

petr% curl -X POST "http://webservices.ingv.it/fdsnws/station/1/query" -H "accept: application/xml" -H "Content-Type: text/plain" -d "IV ACER * * 2016-01-01T00:00:00 2019-01-01T00:00:00"
<?xml version="1.0" encoding="UTF-8"?>
<FDSNStationXML xmlns="http://www.fdsn.org/xml/station/1" schemaVersion="1.0" xsi:schemaLocation="http://www.fdsn.org/xml/station/1 http://www.fdsn.org/xml/station/fdsn-station-1.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ingv="https://raw.githubusercontent.com/FDSN/StationXML/master/fdsn-station.xsd"><Source>SeisNet-mysql</Source><Sender>INGV-CNT</Sender><Module>INGV-CNT WEB SERVICE: fdsnws-station | version: 1.1.40.3</Module><ModuleURI>http://webservices.ingv.it/fdsnws/station/1/query</ModuleURI><Created>2019-03-26T19:35:07</Created><Network code="IV" startDate="1988-01-01T00:00:00" restrictedStatus="open"><Description>Italian Seismic Network</Description><ingv:Identifier>N1</ingv:Identifier><TotalNumberStations>558</TotalNumberStations><SelectedNumberStations>1</SelectedNumberStations><Station code="ACER" startDate="2007-07-05T12:00:00" restrictedStatus="open"><ingv:Identifier>S1</ingv:Identifier><Latitude>40.7867</Latitude><Longitude>15.9427</Longitude><Elevation>690</Elevation><Site><Name>Acerenza</Name></Site><CreationDate>2007-07-05T12:00:00</CreationDate></Station></Network></FDSNStationXML>

Would you mind to provide the exact test you are performing, in order to reproduce the problem you found? Thanks!

megies commented 5 years ago

To reproduce:

file called "payload":

IV * * * 2016-01-01T00:00:00.000000 2018-12-31T00:00:00.000000

then curl:

$ curl --data-binary "@payload" -X POST 'http://webservices.ingv.it/fdsnws/station/1/query'

result

Error 400

Bad Request: 
 doIngvProcessing - Unknown query parameters: IV_*_*_*_2016-01-01T00:00:00_000000_2018-12-31T00:00:00_000000

Request: 
 http://webservices.ingv.it/fdsnws/station/1/query

Input request (POST): 
IV * * * 2016-01-01T00:00:00.000000 2018-12-31T00:00:00.000000

Request Submitted: 
 2019-03-28T10:03:53 UTC

Service version: 
 1.1.40.3(default) 
petrrr commented 5 years ago

Okay, seems to be an encoding issue:

petr% curl --data-binary "@payload" -X POST 'http://webservices.ingv.it/fdsnws/station/1/query'

Error 400

Bad Request: 
 doIngvProcessing - Unknown query parameters: IV_*_*_*_2016-01-01T00:00:00_000000_2018-12-31T00:00:00_000000

Request: 
 http://webservices.ingv.it/fdsnws/station/1/query

Input request (POST): 
IV * * * 2016-01-01T00:00:00.000000 2018-12-31T00:00:00.000000

Request Submitted: 
 2019-03-28T16:03:09 UTC

Service version: 
 1.1.40.3

while this works:

petr% curl --data-binary "@payload" -X POST 'http://webservices.ingv.it/fdsnws/station/1/query' -H "Content-Type: text/plain"
<?xml version="1.0" encoding="UTF-8"?>
<FDSNStationXML xmlns="http://www.fdsn.org/xml/station/1" schemaVersion="1.0" xsi:schemaLocation="http://www.fdsn.org/xml/station/1 http://www.fdsn.org/xml/station/fdsn-station-1.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ingv="https://raw.githubusercontent.com/FDSN/StationXML/master/fdsn-station.xsd"><Source>SeisNet-mysql</Source><Sender>INGV-CNT</Sender><Module>INGV-CNT WEB SERVICE: fdsnws-station | version: 1.1.40.3</Module><ModuleURI>http://webservices.ingv.it/fdsnws/station/1/query</ModuleURI><Created>2019-03-28T17:01:23</Created><Network code="IV" startDate="1988-01-01T00:00:00" restrictedStatus="open"><Description>Italian Seismic Network</Description><ingv:Identifier>N1</ingv:Identifier><TotalNumberStations>558</TotalNumberStations><SelectedNumberStations>1</SelectedNumberStations><Station code="ACER" startDate="2007-07-05T12:00:00" restrictedStatus="open"><ingv:Identifier>S1</ingv:Identifier><Latitude>40.7867</Latitude><Longitude>15.9427</Longitude><Elevation>690</Elevation><Site><Name>Acerenza</Name></Site><CreationDate>2007-07-05T12:00:00</CreationDate></Station></Network></FDSNStationXML>
petrrr commented 5 years ago

So this seems to be an encoding issue. Is the service expected to accept URL encoded payload? The specs do not mention this explicitly.

krischer commented 5 years ago

Hmm - that's a tough question. But given that every other service seems to expect urlencoded payloads (and even curl defaults to it) I'd say it is currently the better option and nobody ever considered plain encoding? Otherwise there'd need to be something in the WADL files to we can distinguish the different services. Probably worthwhile to contact the FDSN to add the url encoding to the standard?

petrrr commented 5 years ago

I would like to update you on this issue and correct some of my previous affirmations. After some debugging and conversation by/with my colleges these are the findings.

The problem is actually exactly the opposite of what imply comments https://github.com/obspy/obspy/issues/2364#issuecomment-477687240 and https://github.com/obspy/obspy/issues/2364#issuecomment-482253360.

The payload is never encoded in any special way, it is just send plain text as defined in the FDSN standard document. However, the client seems to claim by means of the Content-Type header the payload would be application/x-www-form-urlencoded, i.e. HTML Form encoded (probably just by default). The payload is not encoded this way and the content is (mis)interpreted by the framework used.

megies commented 5 years ago

@petrrr we do not set Content-Type header explicitly and I did not see requests set it either. And both obspy and curl were affected, so I don't think it was a client side issue. Plus, it seems to be resolved now with no changes on our side.. :smirk:

petrrr commented 5 years ago

@megies: I confirm we have "fixed" the issue on the service side.

However, it is in fact the missing Content-Type header which seems somewhat problematic here. Apparently, if this header is missing the default Content type, namely application/x-www-form-urlencoded should be assumed. But the payload of the POST request clearly is not encoded that way and in some cases cannot be decoded assuming application/x-www-form-urlencoded.

We recognize that (1) the FDSN standard is probably not very clear/correct here and might need some clarification or requesting setting a particular type, (2) that currently other services behave differently, (3) other clients seem to behave as ObsPy does.

Still, it would be more correct to set the Content-Type header, in some way that it correspond to the format of the payload. But I guess we need to bring this up at FDSN.

megies commented 5 years ago

It'd be trivial to set content type for the http POST request, if anybody feels they know what exactly should be in there.. (Content-Type: text/plain; charset="ASCII"?)