seismo-live / seismo_live

Live Jupyter Notebooks for Seismology
http://seismo-live.org
78 stars 73 forks source link

Added ORFEUS Webservice workshop notebook #16

Closed Jollyfant closed 6 years ago

Jollyfant commented 6 years ago

Hi Lion, here's the bulk of the workshop notebook. Might make some very minor changes tomorrow but I hope not.

Thanks!

Jollyfant commented 6 years ago

Moved + added seismo-live header. The new HTML group seems OK too.

Jollyfant commented 6 years ago

Let me send this version around before merging please. Might be some more minor changes.

Jollyfant commented 6 years ago

@megies this notebook is a demo on how to use EIDA webservices (regardless of ObsPy). So we have most exercises talking to the webservice API directly. But there is also some examples on how to use tools like ObsPy see 1.2.2.

krischer commented 6 years ago

Before I get started: I'll totally merge this over the weekend so it will be online for the workshop next week. Afterwards (or before if you agree) I'd like to see some changes or if you don't care we can also just remove it again.

After looking at this in a bit more detail I also feel like this encourages "bad practices" to some extent. It is fine to show the URLs but (at least from ObsPy's point of view) we don't want people to use the read() method and manually build up queries (and I have seen a fair number of people do just that because they thought that's just how it works).

Concrete suggestions I have

Given that I have so many complaints I'm also willing to help implement them. Let me know how you want to proceed :)

Jollyfant commented 6 years ago

After looking at this in a bit more detail I also feel like this encourages "bad practices" to some extent. It is fine to show the URLs but (at least from ObsPy's point of view) we don't want people to use the read() method and manually build up queries (and I have seen a fair number of people do just that because they thought that's just how it works).

I would not call it bad practice as much as it is an inconvenient way to do things; ObsPy supports reading from an URL after all. I want to emphasize again this will not be a session about ObsPy. We will use it extensively however, because it is a great tool and it helps with data visualization. Principally this workshop is meant as an introduction for an audience that has little experience with webservices. I know this all seems very trivial to us as devs but some participants have never worked with an API before. I think it is important to show it at a conceptual level and not just hand the users a tool (.Client) without a conceptual understanding of what they're holding. At the end we are always using examples that use the ObsPy client and should :bulb: "oh, this is easier.." -- I think it's clear ObsPy is the winner in this case.

I'm actually not sure if the routing (in general) is correct via the get return format. I've wondered for a while but I believe it (on the service level already) could only work if the router works on a per-network code basis (but if it does all the other arguments would be pointless). With the get return format it could for example never route two station in one network to separate data centers in general. Let me know if this makes sense - otherwise I'll come up with an illustrative example.

I think you mean a request like this? It should be fine:

http://www.orfeus-eu.org/eidaws/routing/1/query?sta=A368A,A359A,A339A&format=get

http://eida.bgr.de/fdsnws/dataselect/1/query?end=2020-07-01T00:00:00&sta=A368A&start=2015-01-01T00:00:00&net=Z3
http://geofon.gfz-potsdam.de/fdsnws/dataselect/1/query?end=2020-07-01T00:00:00&sta=A359A&start=2015-09-07T00:00:00&net=Z3
http://www.orfeus-eu.org/fdsnws/dataselect/1/query?sta=A339A&start=2015-01-01T00:00:00&net=Z3

I also think our new routing client is actually quite nice to use so it would be great to showcase it.

I agree it's a nice tool and would like to but the focus is not ObsPy and we are limited to 50 minutes.

The last example is particularly confusing again from the ObsPy point of view. It mixes ObsPy with custom date formatting and other date types and manually builds the requests. This would be a lot simpler by fully staying in ObsPy. Especially our UTCDateTime class is really made for these kind of calculations.

Again I want to stay as close as possible to the APIs. Not all webservices are currently supported by ObsPy clients (WFCatalog) anyway. If you can point me to improvements on the way dates are handles that would be appreciated.

There are a couple functions in obspy.geodetics that can do versions of what the haversine() function is doing.

That would be a welcome change. I'll have a look over the weekend.

The workshop is next Thursday. If the notebook doesn't fit your philosophy I understand and looking around you have plenty of good notebooks already. I'm grateful enough you offered to host it on seismo-live during the workshop. We can ask participants to open containers and have them upload the notebook themselves.

krischer commented 6 years ago

I would not call it bad practice as much as it is an inconvenient way to do things; ObsPy supports reading from an URL after all. I want to emphasize again this will not be a session about ObsPy. We will use it extensively however, because it is a great tool and it helps with data visualization. Principally this workshop is meant as an introduction for an audience that has little experience with webservices. I know this all seems very trivial to us as devs but some participants have never worked with an API before. I think it is important to show it at a conceptual level and not just hand the users a tool (.Client) without a conceptual understanding of what they're holding. At the end we are always using examples that use the ObsPy client and should 💡 "oh, this is easier.." -- I think it's clear ObsPy is the winner in this case.

Okay fair enough. A different point of view I guess. Could you maybe add box somewhere in the beginning of the notebook to state this in like a sentence or two?

Again I want to stay as close as possible to the APIs. Not all webservices are currently supported by ObsPy clients (WFCatalog) anyway. If you can point me to improvements on the way dates are handles that would be appreciated.

Just use the UTCDateTime() objects. They can be initialized with almost anything representing a time object, you no longer have to worry about time zone and arithmetics can be done with simple integers (no need to use a timedelta object).

Time handling: http://vm-141-40-254-12.cloud.mwn.de:8000/user/xxxx/notebooks/ObsPy/02_UTCDateTime-with_solutions.ipynb

Similar exercise to yours already using the time objects: http://vm-141-40-254-12.cloud.mwn.de:8000/user/xxxx/notebooks/ObsPy/07_Basic_Processing_Exercise-with_solutions.ipynb

(Not sure if these links always work).

The workshop is next Thursday. If the notebook doesn't fit your philosophy I understand and looking around you have plenty of good notebooks already. I'm grateful enough you offered to host it on seismo-live during the workshop. We can ask participants to open containers and have them upload the notebook themselves.

No that's fine. seismo-live explicitly has a broader scope then ObsPy! Can you remind me next Wednesday to up the number of seismo-live containers for a day?

I think you mean a request like this? It should be fine:

http://www.orfeus-eu.org/eidaws/routing/1/query?sta=A368A,A359A,A339A&format=get

Not exactly the scenario I had in mind but I just realized that the routing service also can return more than one GET query for the same data center so my concern was not valid! :)

megies commented 6 years ago

Principally this workshop is meant as an introduction for an audience that has little experience with webservices

Even more reason to just use a high-level API, I'd argue. Sorry, but I really don't get the point about encouraging people to build up URLs manually.

I agree it's a nice tool and would like to but the focus is not ObsPy and we are limited to 50 minutes.

I think it's way faster to explain client.get_waveforms(..) rather than how you reformat your query URL strings..

Could you maybe add box somewhere in the beginning of the notebook to state this in like a sentence or two?

I guess the main point about this one is that people might take it as a template for their programs in high-level programming when looking at this, while it's supposed to just show how low-level FDSNWS URL endpoints.

@krischer's proposal sounds like a viable solution. We just need to be sure that this whole notebook doesn't leave the impression that it's showing the recommended usage. So having a big red warning at top and bottom is needed I think.

I'm just still a bit concerned because from my experience, some people will still pick this up for their workflows.. no matter how hard you mention the better solutions.

megies commented 6 years ago

Thinking about this more.. the debug option might be the best solution, no?

You could do the manual URL building in the first example and in all foloowing examples use the debug=True option which also shows the URLS..

In [1]: from obspy.clients.fdsn import Client

In [2]: client = Client(debug=True)
Installed new opener with handlers: [<obspy.clients.fdsn.client.CustomRedirectHandler instance at 0x7f61c10a33f8>]
Base URL: http://service.iris.edu
Request Headers: {u'User-Agent': u'ObsPy/1.1.0rc7.post0+25.g186385ee13.obspy.read.isf (Linux-4.9.0-0.bpo.3-amd64-x86_64-with-debian-8.8, Python 2.7.14)'}
Downloading http://service.iris.edu/fdsnws/dataselect/1/application.wadl with requesting gzip compression
Downloading http://service.iris.edu/fdsnws/event/1/application.wadl with requesting gzip compression
Downloading http://service.iris.edu/fdsnws/station/1/application.wadl with requesting gzip compression
Downloading http://service.iris.edu/fdsnws/event/1/catalogs with requesting gzip compression
Downloading http://service.iris.edu/fdsnws/event/1/contributors with requesting gzip compression
Uncompressing gzipped response for http://service.iris.edu/fdsnws/dataselect/1/application.wadl
Uncompressing gzipped response for http://service.iris.edu/fdsnws/station/1/application.wadl
Uncompressing gzipped response for http://service.iris.edu/fdsnws/event/1/application.wadl
Downloaded http://service.iris.edu/fdsnws/dataselect/1/application.wadl with HTTP code: 200
Downloaded http://service.iris.edu/fdsnws/station/1/application.wadl with HTTP code: 200
Downloaded http://service.iris.edu/fdsnws/event/1/application.wadl with HTTP code: 200
Downloaded http://service.iris.edu/fdsnws/event/1/catalogs with HTTP code: 200
Uncompressing gzipped response for http://service.iris.edu/fdsnws/event/1/contributors
Downloaded http://service.iris.edu/fdsnws/event/1/contributors with HTTP code: 200
Discovered dataselect service
Discovered station service
Discovered event service
Storing discovered services in cache.

In [3]: t = UTCDateTime()

In [4]: client.get_waveforms('IU', 'ANMO', '00', 'BHZ', t-1000, t-900)
Downloading http://service.iris.edu/fdsnws/dataselect/1/query?network=IU&station=ANMO&location=00&starttime=2017-10-20T20%3A35%3A39.101644&endtime=2017-10-20T20%3A37%3A19.101644&channel=BHZ without requesting gzip compression
Downloaded http://service.iris.edu/fdsnws/dataselect/1/query?network=IU&station=ANMO&location=00&starttime=2017-10-20T20%3A35%3A39.101644&endtime=2017-10-20T20%3A37%3A19.101644&channel=BHZ with HTTP code: 200
Out[4]: 
1 Trace(s) in Stream:
IU.ANMO.00.BHZ | 2017-10-20T20:35:39.119538Z - 2017-10-20T20:37:19.069538Z | 20.0 Hz, 2000 samples
Jollyfant commented 6 years ago

I really disagree that building URLs is not recommended usage. That is just how an API works. I see the point from your perspective but on the other hand, you risk the chance of people only understanding how to use ObsPy to interact with our webservices. There are many other tools that people use that are not ObsPy.

Personally I never use Client in my workflows because it takes away control over the requests. I'm not inclined to put a warning on top of the notebook.

Edit @krischer sorry missed your comments - I will add some explanation on top of the notebook and maybe some hints below the exercises. Will update the UTCDateTime too.

krischer commented 6 years ago

Great :) Its always interesting to get people with differing points of views.

I'll merge this now in any case as I have to restart seismo-live right now for another workshop. Then you can try it online. Please just add all changes in a seperate PR.

Personally I never use Client in my workflows because it takes away control over the requests.

I'm kind of interested in what situation you feel that the Clients take away control - I more or less view them as a straight 1-to-1 mapping.

megies commented 6 years ago

Alright, in any case there's one example now to use the obspy client, so I guess that should be good enough for people to recognize that option.