pangaea-data-publisher / pangaeapy

PANGAEA Python Client
https://www.pangaea.de/
GNU General Public License v3.0
28 stars 18 forks source link

Rewrite parts of data download to simplify especially error handling: Use content negotiation #30

Closed uschindler closed 1 year ago

uschindler commented 2 years ago

I reviewed the current code to downaload datasets and figured out that it does a lot of if/then/else and parses XML files to figure out if datasets are freely accessible, or if they are parents. This is done for the reason because it needs to guess datatype. It also looks like the code wants to not hammer PANGAEA with useless requests. But this is no problem at all. The response that a content type is not supported is cheap and the http status code comes fast. I'd do the data download like that:

This should always return the normal tab-separated-values format. No need to cross-check content-type in response or anything like that. The download code should only look at status code:

If you want to get the native PANGAEA metadata in panmd format, please DO NOT use oai-pmh (I think pangaear dors this not sure about pangaeapy). The native PANGAEA metadata can and should also be retrieved by content negotiation: Accept: application/vnd.pangaea.metadata+xml

And finally to get the citation string use: Accept: text/x-bibliography (the default charset is always UTF-8). The current code does not parse any charset parameter on the content-type.

uschindler commented 2 years ago

See also those slides: https://docs.google.com/presentation/d/1mJEufjTK0O823Yc4zmsiNLua77_6p3UsBSXaVjx1A54/edit?usp=sharing

(our own code should follow the official recommendations and not häckidyhickhack with non-standard query parameters)