tomplus / kubernetes_asyncio

Python asynchronous client library for Kubernetes http://kubernetes.io/
Apache License 2.0
353 stars 70 forks source link

watch exit unexpected #325

Closed fighterhit closed 1 month ago

fighterhit commented 1 month ago

In order to do service discovery in my self-hosted k8s cluster, I use the following code to watch the changes of the endpoint under the specified namespace, but it will exit with the following exception after a period of time. What is the reason? I found that it takes about 5 minutes from startup to abnormal exit. Is this related to the timeout_seconds setting? But according to the cause of the exception, it doesn't seem so, and I think the watch operation should keep observing the changes of the endpoint by default.

async def watch_endpoints():
    async with client.ApiClient() as api:
        v1 = client.CoreV1Api(api)
        async with watch.Watch().stream(v1.list_namespaced_endpoints, "MY_NS") as stream:
            async for event in stream:
                evt, obj = event["type"], event["object"]
                ips = []
                if obj.subsets:
                    for ep in obj.subsets:
                        for addr in ep.addresses:
                            ips.append(addr.ip)
                    print(
                        "{} {}/{} endpoints {}".format(
                            evt, obj.metadata.namespace, obj.metadata.name, ips
                        )
                    )
Task exception was never retrieved
future: <Task finished name='Task-1' coro=<watch_endpoints() done, defined at /root/k8s.py:33> exception=ApiException()>
Traceback (most recent call last):
  File "/root/k8s.py", line 37, in watch_endpoints
    async for event in stream:
  File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/watch/watch.py", line 131, in __anext__
    return await self.next()
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/watch/watch.py", line 174, in next
    return self.unmarshal_event(line, self.return_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/watch/watch.py", line 103, in unmarshal_event
    raise client.exceptions.ApiException(status=obj['code'], reason=reason)
kubernetes_asyncio.client.exceptions.ApiException: (410)
Reason: Expired: too old resource version: 3250692444 (3250783264)

When I set timeout_seconds=600, the program still exits 5 minutes after startup, but another exception is raised.

Task exception was never retrieved
future: <Task finished name='Task-1' coro=<watch_endpoints() done, defined at /root/k8s.py:33> exception=TimeoutError()>
Traceback (most recent call last):
  File "/root/k8s.py", line 37, in watch_endpoints
    async for event in stream:
  File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/watch/watch.py", line 131, in __anext__
    return await self.next()
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/watch/watch.py", line 152, in next
    line = await self.resp.content.readline()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 311, in readline
    return await self.readuntil()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 343, in readuntil
    await self._wait("readuntil")
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 303, in _wait
    with self._timer:
  File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 720, in __exit__
    raise asyncio.TimeoutError from None
TimeoutError
fighterhit commented 1 month ago

I found that _request_timeout can be passed as a parameter to watch.Watch().stream according to #259, which finally used as the timeout parameter of aiohttp (default 5min), which can avoid TimeoutError, but another exception(Reason: Expired: too old resource version...) will still be thrown after the _request_timeout is reached, but at least we can increase the watch time.

tomplus commented 1 month ago

Duplicated of #136

If you want to watch forever it should work without _request_timeout, timeout. These 410s are real problem here.

fighterhit commented 1 month ago

Duplicated of #136

If you want to watch forever it should work without _request_timeout, timeout. These 410s are real problem here.

@tomplus Thanks, is there a solution now?

tomplus commented 1 month ago

Not now, but I'll take a look on it next week