JSON decode error on Nginx 504 Gateway Time-out from Meilisearch Cloud

sauloperez commented 2 months ago

Description

When receiving a 504 Gateway timeout on a call to index.update_documents, the SDK raises a JSONDecodeError instead of the expected MeilisearchApiError. You can see the stacktrace we found in our Sentry account in the Screenshots or Logs section below.

Note we're using Meilisearch Cloud so the underlying issue is related to Meilisearch's Cloud unresponsiveness.

Expected behavior

One would expect to get a meaningful MeilisearchApiError so we can retry the operation at a later time, or any other fault-tolerance mechanism.

Current behavior

A JSONDecodeError was raised, which initially makes you think there's a problem on the way the documents are serialized.

Screenshots or Logs Screenshot from 2024-07-04 10-52-01

Environment (please complete the following information):

OS: Debian 11.10
Meilisearch version: v1.8.1
meilisearch-python version: 0.31.3

mde-pach commented 2 months ago

Indeed, there is a premise in the way the package is managing such errors.

Here https://github.com/meilisearch/meilisearch-python/blob/127b47805f92f6b26fa420e8f59b8de26cd05007/meilisearch/errors.py#L28 And here https://github.com/meilisearch/meilisearch-python/blob/127b47805f92f6b26fa420e8f59b8de26cd05007/meilisearch/_httprequests.py#L137

We can see that every errors are intended to be MeilisearchApiError and then follow the same data validation/formatting process as any encountered errors.

To me, it looks like the error management need to handle differently errors that are returned by meilisearch (which are intended to be, following this implementation, only in a json format) and errors that can be returned by some in the middle third part tool like a reverse proxy (nginx here).

There is multiple way to handle such use case, the best one to me is introduced by the RFC 9457 but it's not commonly adopted by systems we are working with today so it doesn't look like a valid option here (maybe in 10 years 😅).

Another solution would be to being able to differentiate meilisearch responses from others. We can use a custom header set by meilisearch server to only try to handle errors as MeilisearchApiError in this use case and manage an ExternalWhateverError for others.

We can also just try/except on json decoding and manage it another way but you can't distinguish not json formatted errors returned by meilisearch (which are not supposed to happen) from not json formatted returned by other services

Regardless the solution you chose to implement, I would be pleased to do it myself in this repository :)

sanders41 commented 2 months ago

I have not had time to really look at this, but a simple solution could potentially be changing

        if request.text:
            json_data = json.loads(request.text)
            self.message = json_data.get("message")
            self.code = json_data.get("code")
            self.link = json_data.get("link")
            self.type = json_data.get("type")

to

        if request.content:
            json_data = request.json()
            self.message = json_data.get("message")
            self.code = json_data.get("code")
            self.link = json_data.get("link")
            self.type = json_data.get("type")

This is untested so I'm not sure if it would solve the issue, but maybe.

mde-pach commented 2 months ago

If the data is not JSON decodable response.json() will still raise a requests.exceptions.JSONDecodeError

The main point for this issue is to:

Identify when the data is not in an expected format
Find what to do for such data

sauloperez commented 2 months ago

and it happened again :see_no_evil:

curquiza commented 2 months ago

Hello here, I'm discussing with Meilisearch team. You problem might not be related to the Python SDK (even if changes can always be applied, but let's try to mitigate the problem first).

We had issues with v1.8.0, v1.8.1 and v1.8.2: soft lock and memory leaks we solved in v1.8.3. As far as we know, you are using v1.8.1 on the Cloud, let me know if I'm wrong. Sorry for the inconvenience but we recommend you migrate to the latest version of Meilisearch. This can be easily done, cf documentation.

sanders41 commented 2 months ago

@mde-pach I put in a fix for this in another library here. Do you want to try to do something similar? The idea is if the response isn't JSON we know the error didn't come from Meilisearch so raise the error as is rather than trying to process it.

But as @curquiza said, this won't solve the underlying issue, only hopefully make it easier to understand what is happening.

brunoocasali commented 2 months ago

Hi folks!

I was discussing with @curquiza and the team that we should change the content types of these errors in our Meilisearch Cloud infra. You got the parsing error because the message comes from our proxy layer, not the meilisearch instance, which is HTML. That's why they are so different.

So, we could add special handling like @sanders41 in his SDK, but ideally, everything should be JSON in the first place, so I don't think it is worth it until we change on our side first. Changing to JSON would unlock more possibilities for handling them, like creating special classes to give the consumer the power to decide what they should do when they face a network error like this.

mde-pach commented 2 months ago

Hi folks!

I was discussing with @curquiza and the team that we should change the content types of these errors in our Meilisearch Cloud infra. You got the parsing error because the message comes from our proxy layer, not the meilisearch instance, which is HTML. That's why they are so different.

So, we could add special handling like @sanders41 in his SDK, but ideally, everything should be JSON in the first place, so I don't think it is worth it until we change on our side first. Changing to JSON would unlock more possibilities for handling them, like creating special classes to give the consumer the power to decide what they should do when they face a network error like this.

Make sense to me :)

sanders41 commented 2 months ago

@sauloperez if upgrading Meilisearch doesn't solve the issue, you could add tenacity to your functions that are getting error responses to have them retry. It would be a work around and not a real fix, but might help.

sauloperez commented 2 months ago

thanks everyone, we'll definitely go with tenacity or backoff, while we also try to upgrade to v1.9, which has failed so far.

meilisearch / meilisearch-python

JSON decode error on Nginx 504 Gateway Time-out from Meilisearch Cloud #990