openml / OpenML

Open Machine Learning
https://openml.org
BSD 3-Clause "New" or "Revised" License
662 stars 91 forks source link

API occasionally unstable with not usable exceptions or strange HTTP error codes #1083

Open mitar opened 3 years ago

mitar commented 3 years ago

We contact OpenML REST API inside our CI with few requests. Every so and so our tests fail. It seems that because the server is under the load. See for example this run: https://gitlab.com/datadrivendiscovery/d3m/-/jobs/924199064

So when using OpenML Python package we get:

openml.exceptions.OpenMLServerException: Database connection error. Usually due to high server load. Please wait for N seconds and try again. - None

Why it says N seconds? This is hard to wrap, wait, and retry? Or am I missing some information?

When using REST API directly, the error is:

requests.exceptions.HTTPError: 412 Client Error: Precondition Failed for url: https://www.openml.org/api/v1/json/task/168861

I do not think 412 is the right one here. 412 is used for cache validation. I think you want 429 Too Many Requests with Retry-After header. Or 503 Service Unavailable.

mitar commented 3 years ago

cc @dmartinez05

joaquinvanschoren commented 3 years ago

Thanks. We're working on the server load issue right now. Also see here: https://github.com/openml/OpenML/issues/1092

It seems that we indeed misinterpreted the meaning of status code 412, but this is meant to indicate that the server could not answer the request because some information (e.g. a dataset id or an API key) was missing in the request. It should also tell you what was missing. Didn't it?

429 and 503 are not right either. Maybe this should simply be a '400 Bad Request'. I'll open another issue for that. Closing this now to avoid duplicates.

mitar commented 3 years ago

So just to be clear, this issue is reporting that same requests sometimes fail when sometimes succeed. So it is not that some parameter is missing in the request (for those 400 is a good response). The problem here is that there is some underlying error which happens and then this is propagated up to the HTTP response. You could also just return 500 in those cases, BTW.

So I think that if this is really about rate limiting or something, then 429 is a better response. If it is just about internal error (like connection to internal database failed), then 500 is good.

I think the simplest thing would be to return 500 in such cases.