Open jake-delorme opened 1 year ago
Noticing the same behaviour on EKS with Python 3.11.2, aiohttp 3.8.4, kopf 1.36.0. In my case I am monitoring delete events for Job resources.
What seems to happen is: request fails because of token expiry -> goes to sleep because of backoff -> while sleeping the session gets closed by vault when invalidating previous credentials -> upon waking up and retrying the request it fails with RuntimeError: Session is closed
because it still references the closed session.
We are also noticing the same issues but in AKS.
Anybody found a workaround or solution for this problem? Maybe a way to include this in the probes?
python version 3.11.3 kopf version 1.36.1
What seems to happen is: request fails because of token expiry -> goes to sleep because of backoff -> while sleeping the session gets closed by vault when invalidating previous credentials -> upon waking up and retrying the request it fails with
RuntimeError: Session is closed
because it still references the closed session.
Thanks for this hint! Indeed, this is a possible scenario in some (rare?) cases of API failing several concurrent requests.
Can you please try a fix from https://github.com/nolar/kopf/pull/1031 — the branch is session-closed-in-reauth
.
Depending on your packaging system, that can be installed like this (pip or requirements.txt; different for Poetry):
pip install git+https://github.com/nolar/kopf.git@session-closed-in-reauth
That would be Kopf 1.36.1 plus this fix only.
Hi @nolar
I think the patch doesn't work, I have introduced a couple of probes to check whether the watcher is alive and it seems to always fail.
This happens to us regularly, maybe depending on cluster setup.
In particular i think it all started when we configured: automountServiceAccountToken=false
And started mounting the token using volumen projection with a duration of 3600s.
Seems like kopf doesn't handle the refresh well.
below my logs:
[2023-06-13 11:08:41,259] aiohttp.access [INFO ] 10.251.132.1 [13/Jun/2023:11:08:41 +0000] "GET /healthz HTTP/1.1" 200 229 "-" "kube-probe/1.26"
[2023-06-13 11:09:41,216] kopf.activities.prob [INFO ] Retrieving all key for probe
[2023-06-13 11:09:41,238] kopf.activities.prob [INFO ] Activity 'key' succeeded.
[2023-06-13 11:09:41,238] kopf.activities.prob [INFO ] Retrieving all trust for probe
[2023-06-13 11:09:41,262] kopf.activities.prob [INFO ] Activity 'trust' succeeded.
[2023-06-13 11:09:41,262] aiohttp.access [INFO ] 10.251.132.1 [13/Jun/2023:11:09:41 +0000] "GET /healthz HTTP/1.1" 200 229 "-" "kube-probe/1.26"
[2023-06-13 11:10:21,045] kopf._core.engines.a [INFO ] Re-authentication has been initiated.
[2023-06-13 11:10:21,046] kopf._cogs.clients.w [ERROR ] Request attempt #1/9 failed; will retry: GET https://10.251.0.1:443/apis/com.acme/v1/trusts?watch=true&resourceVersion=1773126&timeoutSeconds=60 -> ClientOSError(1, '[SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2702)')
[2023-06-13 11:10:21,046] kopf._cogs.clients.w [ERROR ] Request attempt #1/9 failed; will retry: GET https://10.251.0.1:443/apis/com.acme/v1/keys?watch=true&resourceVersion=1773126&timeoutSeconds=60 -> ClientOSError(1, '[SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2702)')
[2023-06-13 11:10:21,048] kopf.activities.auth [INFO ] Activity 'login_via_client' succeeded.
[2023-06-13 11:10:21,048] kopf._core.engines.a [INFO ] Re-authentication has finished.
[2023-06-13 11:10:22,048] kopf._cogs.clients.w [ERROR ] Request attempt #2/9 failed; will try re-authenticating: GET https://10.251.0.1:443/apis/com.acme/v1/trusts?watch=true&resourceVersion=1773126&timeoutSeconds=60
[2023-06-13 11:10:22,048] kopf._cogs.clients.w [ERROR ] Request attempt #2/9 failed; will try re-authenticating: GET https://10.251.0.1:443/apis/com.acme/v1/keys?watch=true&resourceVersion=1773126&timeoutSeconds=60
[2023-06-13 11:10:22,049] kopf._core.engines.a [INFO ] Re-authentication has been initiated.
[2023-06-13 11:10:22,050] kopf.activities.auth [INFO ] Activity 'login_via_client' succeeded.
[2023-06-13 11:10:22,050] kopf._core.engines.a [INFO ] Re-authentication has finished.
[2023-06-13 11:10:22,051] kopf._core.reactor.o [ERROR ] Watcher for keys.v1.com.acme@none has failed: Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/api.py", line 78, in request
response = await context.session.request(
File "/opt/venv/lib/python3.10/site-packages/aiohttp/client.py", line 400, in _request
raise RuntimeError("Session is closed")
RuntimeError: Session is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/auth.py", line 45, in wrapper
return await fn(*args, **kwargs, context=context)
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/api.py", line 94, in request
raise errors.APISessionClosed("Session is closed.") from e
kopf._cogs.clients.errors.APISessionClosed: Session is closed.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 108, in guard
await coro
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/reactor/queueing.py", line 175, in watcher
async for raw_event in stream:
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 86, in infinite_watch
async for raw_event in stream:
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 201, in continuous_watch
async for raw_input in stream:
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 266, in watch_objs
async for raw_input in api.stream(
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/api.py", line 210, in stream
response = await request(
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/auth.py", line 47, in wrapper
await vault.invalidate(key, exc=e)
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 291, in invalidate
raise LoginError("Ran out of valid credentials. Consider installing "
kopf._cogs.structs.credentials.LoginError: Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/
[2023-06-13 11:10:22,053] kopf._core.reactor.o [ERROR ] Watcher for trusts.v1.com.acme@none has failed: Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/api.py", line 78, in request
response = await context.session.request(
File "/opt/venv/lib/python3.10/site-packages/aiohttp/client.py", line 400, in _request
raise RuntimeError("Session is closed")
RuntimeError: Session is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/auth.py", line 45, in wrapper
return await fn(*args, **kwargs, context=context)
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/api.py", line 94, in request
raise errors.APISessionClosed("Session is closed.") from e
kopf._cogs.clients.errors.APISessionClosed: Session is closed.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 108, in guard
await coro
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/reactor/queueing.py", line 175, in watcher
async for raw_event in stream:
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 86, in infinite_watch
async for raw_event in stream:
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 201, in continuous_watch
async for raw_input in stream:
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 266, in watch_objs
async for raw_input in api.stream(
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/api.py", line 210, in stream
response = await request(
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/auth.py", line 47, in wrapper
await vault.invalidate(key, exc=e)
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 291, in invalidate
raise LoginError("Ran out of valid credentials. Consider installing "
kopf._cogs.structs.credentials.LoginError: Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/
[2023-06-13 11:10:41,216] kopf.activities.prob [INFO ] Retrieving all key for probe
[2023-06-13 11:10:41,216] kopf.activities.prob [ERROR ] Activity 'key' failed with an exception. Will retry.
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/actions/execution.py", line 276, in execute_handler_once
result = await invoke_handler(
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/actions/execution.py", line 371, in invoke_handler
result = await invocation.invoke(
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/actions/invocation.py", line 116, in invoke
result = await fn(**kwargs) # type: ignore
File "/src/ctl/kopf_operator.py", line 166, in get_keys
await check_tasks("keys")
File "/src/ctl/kopf_operator.py", line 187, in check_tasks
raise GenericException(f'Missing watcher for {resource_type} resource type')
utils.exceptions.GenericException: Missing watcher for keys resource type
[2023-06-13 11:10:41,217] kopf.activities.prob [INFO ] Retrieving all trust for probe
[2023-06-13 11:10:41,217] kopf.activities.prob [ERROR ] Activity 'trust' failed with an exception. Will retry.
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/actions/execution.py", line 276, in execute_handler_once
result = await invoke_handler(
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/actions/execution.py", line 371, in invoke_handler
result = await invocation.invoke(
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/actions/invocation.py", line 116, in invoke
result = await fn(**kwargs) # type: ignore
File "/src/ctl/kopf_operator.py", line 175, in get_trusts
await check_tasks("trusts")
File "/src/ctl/kopf_operator.py", line 187, in check_tasks
raise GenericException(f'Missing watcher for {resource_type} resource type')
utils.exceptions.GenericException: Missing watcher for trusts resource type
[2023-06-13 11:11:41,218] kopf.activities.prob [INFO ] Retrieving all key for probe
[2023-06-13 11:11:41,218] kopf.activities.prob [ERROR ] Activity 'key' failed with an exception. Will retry.
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/actions/execution.py", line 276, in execute_handler_once
result = await invoke_handler(
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/actions/execution.py", line 371, in invoke_handler
result = await invocation.invoke(
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/actions/invocation.py", line 116, in invoke
result = await fn(**kwargs) # type: ignore
File "/src/ctl/kopf_operator.py", line 166, in get_keys
await check_tasks("keys")
File "/src/ctl/kopf_operator.py", line 187, in check_tasks
raise GenericException(f'Missing watcher for {resource_type} resource type')
utils.exceptions.GenericException: Missing watcher for keys resource type
[2023-06-13 11:11:41,218] kopf.activities.prob [INFO ] Retrieving all trust for probe
[2023-06-13 11:11:41,218] kopf.activities.prob [ERROR ] Activity 'trust' failed with an exception. Will retry.
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/actions/execution.py", line 276, in execute_handler_once
result = await invoke_handler(
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/actions/execution.py", line 371, in invoke_handler
result = await invocation.invoke(
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/actions/invocation.py", line 116, in invoke
result = await fn(**kwargs) # type: ignore
File "/src/ctl/kopf_operator.py", line 175, in get_trusts
await check_tasks("trusts")
File "/src/ctl/kopf_operator.py", line 187, in check_tasks
raise GenericException(f'Missing watcher for {resource_type} resource type')
utils.exceptions.GenericException: Missing watcher for trusts resource type
[2023-06-13 11:11:51,842] kopf._core.reactor.r [ERROR ] Resource observer has failed: Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 108, in guard
await coro
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/reactor/observation.py", line 127, in resource_observer
await queueing.watcher(
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/reactor/queueing.py", line 175, in watcher
async for raw_event in stream:
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 86, in infinite_watch
async for raw_event in stream:
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 201, in continuous_watch
async for raw_input in stream:
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 266, in watch_objs
async for raw_input in api.stream(
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/api.py", line 210, in stream
response = await request(
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/auth.py", line 43, in wrapper
async for key, info, context in vault.extended(APIContext, 'contexts'):
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 158, in extended
async for key, item in self._items():
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 195, in _items
yielded_key, yielded_item = self.select()
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 214, in select
raise LoginError("Ran out of valid credentials. Consider installing "
kopf._cogs.structs.credentials.LoginError: Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/
Traceback (most recent call last):
File "/opt/venv/bin/kopf", line 8, in <module>
sys.exit(main())
File "/opt/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/venv/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/venv/lib/python3.10/site-packages/kopf/cli.py", line 60, in wrapper
return fn(*args, **kwargs)
File "/opt/venv/lib/python3.10/site-packages/click/decorators.py", line 84, in new_func
return ctx.invoke(f, obj, *args, **kwargs)
File "/opt/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/venv/lib/python3.10/site-packages/kopf/cli.py", line 109, in run
return running.run(
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/reactor/running.py", line 81, in run
asyncio.run(coro)
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/reactor/running.py", line 138, in operator
await run_tasks(operator_tasks, ignored=existing_tasks)
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/reactor/running.py", line 419, in run_tasks
await aiotasks.reraise(root_done | root_cancelled | hung_done | hung_cancelled)
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 238, in reraise
task.result() # can raise the regular (non-cancellation) exceptions.
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 108, in guard
await coro
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/reactor/observation.py", line 127, in resource_observer
await queueing.watcher(
File "/opt/venv/lib/python3.10/site-packages/kopf/_core/reactor/queueing.py", line 175, in watcher
async for raw_event in stream:
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 86, in infinite_watch
async for raw_event in stream:
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 201, in continuous_watch
async for raw_input in stream:
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 266, in watch_objs
async for raw_input in api.stream(
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/api.py", line 210, in stream
response = await request(
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/clients/auth.py", line 43, in wrapper
async for key, info, context in vault.extended(APIContext, 'contexts'):
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 158, in extended
async for key, item in self._items():
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 195, in _items
yielded_key, yielded_item = self.select()
File "/opt/venv/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 214, in select
raise LoginError("Ran out of valid credentials. Consider installing "
kopf._cogs.structs.credentials.LoginError: Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/
These are my dependencies:
typing_extensions == 4.5.0
requests == 2.28.2
click == 8.1.3
jsonpickle == 3.0.1
pyinstaller == 5.8.0
git+https://github.com/nolar/kopf.git@session-closed-in-reauth
kubernetes == 26.1.0
pyOpenSSL == 23.0.0
pyjks == 20.0.0
pytest == 7.2.1
pytest-mock == 3.10.0
requests-mock == 1.10.0
importlib_resources == 5.12.0
Faced it as well, it's a big problem for us
@Spacca did you find some workaround or just live with that?
We ended up loading the re-authentication hook with a very rudimentary check to force the operator to restart. For us this was every 10 mins. Mileage may vary.
@kopf.on.login()
async def update_credentials(logger, memo: kopf.Memo, **_):
# This is a dirty hack to try and get the operator to restart when resource observers die
count = 0
intrestings_types = ("peering observer", "watcher for", "peering keep-alive", "resource observer", "pause-waiter for")
for task in asyncio.all_tasks():
task_type = " ".join(task.get_name().split()[:2])
if task_type in intrestings_types:
count += 1
if memo.get("task_count", 0) <= count:
logger.debug(f"Watching for {count} intresting tasks. previously {memo.get('task_count')}")
memo["task_count"] = count
else:
logger.error(f"An intresting task went away (were {memo.get('task_count')} now {count}). Exiting the operator")
exit(1)
Long story short
Operator starts up and runs normally as expected. After for running for some time some of the watch streams will throw a
RuntimeError("Session is closed")
. Once this happens that watch stream will never restart until the operator is restarted. This only appears to happen with custom resources (configmaps are fine).Kopf version
1.35.6
Kubernetes version
v1.23.13 eks
Python version
3.9.13
Code
Logs
Additional information
This is an EKS cluster and is using aws eks get-token in the kubeconfig to authenticate.
Using aiohttp version 3.8.3 using kubernetes client version: 24.2.0 aws-cli/2.2.43 Python/3.8.8 (used by kubeconfig)
Not all operators watch streams die at the same time. This is not running in a container on the cluster but on a server outside of aws.