Rewrite parts of data download to simplify especially error handling: Use content negotiation

I reviewed the current code to downaload datasets and figured out that it does a lot of if/then/else and parses XML files to figure out if datasets are freely accessible, or if they are parents. This is done for the reason because it needs to guess datatype. It also looks like the code wants to not hammer PANGAEA with useless requests. But this is no problem at all. The response that a content type is not supported is cheap and the http status code comes fast. I'd do the data download like that:

Use the plain DOI as URL for the download (both works: "https://doi.pangaea.de" but also "https://doi.org" and other variants). Previously with a doi.org URL no download was possible as "format=" parameter gets lost.
Set Authentication: Bearer token if available (see below). No need to check if it is login protected before. Just send always if available.
Set Accept: text/tab-separated-values as header. This enables content negotiation. As this header does NOT look like a plain stupid browser, the PANGAEA code will switch to real "REST mode" and for example respond with correct headers instead of redirects to login page if the dataset is password protected and the credentials do not match. So you don't need to do best guesses when you were redirected and you get the HTML login page. A real REST client will get correct status code to know: "unauthorized".

This should always return the normal tab-separated-values format. No need to cross-check content-type in response or anything like that. The download code should only look at status code:

200 (OK): All went well, you can be sure it is a tab-delimited matrix in PANGAEA format
401 (Unauthorized): Dataset is protected and access rights do not match the bearer token or there's no bearer token at all (e.g., wrong user) or no bearer token at all. This can be reported as error message.
406 (Not acceptable): The format in Accept header cannot be fulfilled. This happens when it is a parent or another type of collection or a static URL dataset with a different media type
404 (Not Found): Dataset does not exist
429 (Too many requests): Wait a few seconds
5xx: some server error, especially 503 means "PANGAEA is down". Report this as hard error to user.

If you want to get the native PANGAEA metadata in panmd format, please DO NOT use oai-pmh (I think pangaear dors this not sure about pangaeapy). The native PANGAEA metadata can and should also be retrieved by content negotiation: Accept: application/vnd.pangaea.metadata+xml

And finally to get the citation string use: Accept: text/x-bibliography (the default charset is always UTF-8). The current code does not parse any charset parameter on the content-type.

pangaea-data-publisher / pangaeapy

Rewrite parts of data download to simplify especially error handling: Use content negotiation #30