nats-io / nats.py

Python3 client for NATS
https://nats-io.github.io/nats.py/
Apache License 2.0
872 stars 180 forks source link

KeyError in _request_new_style #314

Closed mriedem closed 6 months ago

mriedem commented 2 years ago

This issue (#187), or maybe something related, seems to be present still in 2.1.0.

We were hitting some TimeoutErrors in an exponential backoff loop trying to submit a request (network issues presumably). Eventually request raised a KeyError, here is a snip of the traceback:

res = await client.request(topic, msg, timeout=timeout)\n  File \"/home/auckland/.virtualenvs/dispatcher/lib64/python3.8/site-packages/nats/aio/client.py\", line 899, in request\n    msg = await self._request_new_style(\n  File \"/home/auckland/.virtualenvs/dispatcher/lib64/python3.8/site-packages/nats/aio/client.py\", line 937, in _request_new_style\n    self._resp_map.pop(token.decode())\nKeyError: '<redacted>'\n"

I see that #219 changed pop to be graceful if the entry wasn't in the dict:

self._resp_map.pop(token.decode(), None)

And that was reverted in #229, but we can apparently still hit a KeyError here:

https://github.com/nats-io/nats.py/blob/v2.1.0/nats/aio/client.py#L937

I won't profess to understand the internals of the library for how this can happen, but there is apparently still a race condition that's possible here on the latest release. Should this issue be re-opened? Should I open a new issue referencing this one? I don't really want to add KeyError handling in our code calling request but it crashed the app so we might need to for the time being.

Originally posted by @mriedem in https://github.com/nats-io/nats.py/issues/187#issuecomment-1128941938

wallyqs commented 2 years ago

Thanks @mriedem, I'm not a fan that a KeyError crashed your app so what I will do is to to instead send that error to the async error callback while we are investigating the root cause.

mriedem commented 2 years ago

Thanks @mriedem, I'm not a fan that a KeyError crashed your app so what I will do is to to instead send that error to the async error callback while we are investigating the root cause.

Sounds good, thank you for the prompt response. We provide an error_cb so we'll be sure to log it.

wallyqs commented 2 years ago

fyi added handling in v2.1.2 release so that error is reported on error callback instead.

mriedem commented 2 years ago

fyi added handling in v2.1.2 release so that error is reported on error callback instead.

I see (f8a5c9906dd9dae42f09a9f566ae1bd90ef94cfa), thanks!

wallyqs commented 6 months ago

Fixed in v2.1.2 https://github.com/nats-io/nats.py/releases/tag/v2.1.2