[feat] Watch() retries 410 errors

tomplus / kubernetes_asyncio

Python asynchronous client library for Kubernetes http://kubernetes.io/

Apache License 2.0

364 stars 71 forks source link

[feat] Watch() retries 410 errors #327

Closed tomplus closed 3 months ago

tomplus commented 3 months ago

Fixes https://github.com/tomplus/kubernetes_asyncio/issues/136

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 28.15%. Comparing base (2126b1d) to head (a43f780).

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #327 +/- ## ========================================== + Coverage 28.11% 28.15% +0.03% ========================================== Files 779 779 Lines 92138 92184 +46 ========================================== + Hits 25904 25950 +46 Misses 66234 66234 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

rjhuijsman commented 2 months ago

Thanks for doing this! My team is excited to see this land - hopefully this can go into a release soon! 🙂 🙏🏻

tomplus commented 2 months ago

Thank you. I'm preparing the next release, should be available in the following week.

tomplus commented 2 months ago

@rjhuijsman It has been released :rocket:

ianbuss commented 2 months ago

One quick question about the semantics of this that I'm not sure of. If the server has responded with a 410, the resource_version in the watch is no longer available server-side, so the reconnect would always fail with another 410 unless the resource_version is reset to None in the watch? _reconnect doesn't reset the resource_version as far as I can see. I've probably missed a subtlety here (e.g. one Kube server may still have the older resource version in its cache).

tomplus commented 2 months ago

@ianbuss Thanks for your question.

The idea is to try to reconnect only once with the latest resource_version we have. It may be enough in some situation. If it doesn't help an exception is raised and client has to handle it somehow (probably will fetch list of resources and start watching again).

You are right, that resetting version could make it works forever but we won't be aware that we lost some events. So it's better to make it explicit and raise an exception.

ianbuss commented 2 months ago

@ianbuss Thanks for your question.

The idea is to try to reconnect only once with the latest resource_version we have. It may be enough in some situation. If it doesn't help an exception is raised and client has to handle it somehow (probably will fetch list of resources and start watching again).

You are right, that resetting version could make it works forever but we won't be aware that we lost some events. So it's better to make it explicit and raise an exception.

Agree that is the right call, such cases are best handled by the calling code. The PR interested me as I am developing a general go informer-like library over kubernetes-asyncio for all the core types and 410 Gone is one of the special cases I need to handle. Thanks for the response 👍