Closed uschindler closed 1 year ago
See also those slides: https://docs.google.com/presentation/d/1mJEufjTK0O823Yc4zmsiNLua77_6p3UsBSXaVjx1A54/edit?usp=sharing
(our own code should follow the official recommendations and not häckidyhickhack with non-standard query parameters)
I reviewed the current code to downaload datasets and figured out that it does a lot of if/then/else and parses XML files to figure out if datasets are freely accessible, or if they are parents. This is done for the reason because it needs to guess datatype. It also looks like the code wants to not hammer PANGAEA with useless requests. But this is no problem at all. The response that a content type is not supported is cheap and the http status code comes fast. I'd do the data download like that:
Authentication: Bearer token
if available (see below). No need to check if it is login protected before. Just send always if available.Accept: text/tab-separated-values
as header. This enables content negotiation. As this header does NOT look like a plain stupid browser, the PANGAEA code will switch to real "REST mode" and for example respond with correct headers instead of redirects to login page if the dataset is password protected and the credentials do not match. So you don't need to do best guesses when you were redirected and you get the HTML login page. A real REST client will get correct status code to know: "unauthorized".This should always return the normal tab-separated-values format. No need to cross-check content-type in response or anything like that. The download code should only look at status code:
If you want to get the native PANGAEA metadata in panmd format, please DO NOT use oai-pmh (I think pangaear dors this not sure about pangaeapy). The native PANGAEA metadata can and should also be retrieved by content negotiation:
Accept: application/vnd.pangaea.metadata+xml
And finally to get the citation string use:
Accept: text/x-bibliography
(the default charset is always UTF-8). The current code does not parse any charset parameter on the content-type.