When using a ConnectionInfo with an expiration, which is required to work with expiring tokens,
operator does not properly leave peering and does not exit.
It hangs as the updating/leaving the peering needs valid credentials but the vault ends up in a need_reauth state with no 'Credentials retriever' task left to populate the vault with new credentials.
Kopf version
main
Kubernetes version
any
Python version
3.10.10
Code
@kopf.on.login()
async def authenticate(
*,
logger: kopf.Logger,
**_: Any,
) -> Optional[kopf.ConnectionInfo]:
try:
kubernetes_asyncio.config.load_incluster_config() # cluster env vars
logger.debug("Async client is configured in cluster with service account.")
except kubernetes_asyncio.config.ConfigException as e1:
try:
await kubernetes_asyncio.config.load_kube_config() # developer's config files
logger.debug("Async client is configured via kubeconfig file.")
except kubernetes_asyncio.config.ConfigException as e2:
raise kopf.LoginError("Cannot authenticate the async client library "
"neither in-cluster, nor via kubeconfig.")
# We do not even try to understand how it works and why. Just load it, and extract the results.
# For kubernetes client >= 12.0.0 use the new 'get_default_copy' method
if callable(getattr(kubernetes_asyncio.client.Configuration, 'get_default_copy', None)):
config = kubernetes_asyncio.client.Configuration.get_default_copy()
else:
config = kubernetes_asyncio.client.Configuration()
# For auth-providers, this method is monkey-patched with the auth-provider's one.
# We need the actual auth-provider's token, so we call it instead of accessing api_key.
# Other keys (token, tokenFile) also end up being retrieved via this method.
header: Optional[str] = config.get_api_key_with_prefix('BearerToken')
parts: Sequence[str] = header.split(' ', 1) if header else []
scheme, token = ((None, None) if len(parts) == 0 else
(None, parts[0]) if len(parts) == 1 else
(parts[0], parts[1])) # RFC-7235, Appendix C.
#expiration = datetime.datetime.utcnow() + datetime.timedelta(minutes=1)
expiration = datetime.datetime.utcnow() + datetime.timedelta(seconds=10)
#expiration = None
return kopf.ConnectionInfo(
server=config.host,
ca_path=config.ssl_ca_cert, # can be a temporary file
insecure=not config.verify_ssl,
username=config.username or None, # an empty string when not defined
password=config.password or None, # an empty string when not defined
scheme=scheme,
token=token,
certificate_path=config.cert_file, # can be a temporary file
private_key_path=config.key_file, # can be a temporary file
priority=1,
expiration=expiration
)
Logs
^C[2023-06-27 21:58:49,358] kopf._core.reactor.r [INFO ] Signal SIGINT is received. Operator is stopping.
[2023-06-27 21:58:49,358] kopf._core.reactor.r [DEBUG ] Admission mutating configuration manager is cancelled.
[2023-06-27 21:58:49,359] kopf._core.reactor.r [DEBUG ] Admission insights chain is cancelled.
[2023-06-27 21:58:49,359] kopf._core.reactor.r [DEBUG ] Namespace observer is cancelled.
[2023-06-27 21:58:49,359] kopf._core.reactor.r [DEBUG ] Credentials retriever is cancelled.
[2023-06-27 21:58:49,359] kopf._core.reactor.r [DEBUG ] Admission webhook server is cancelled.
[2023-06-27 21:58:49,359] kopf._core.reactor.r [DEBUG ] Admission validating configuration manager is cancelled.
[2023-06-27 21:58:49,360] kopf._core.reactor.r [DEBUG ] Poster of events is cancelled.
[2023-06-27 21:58:49,361] kopf._cogs.clients.w [DEBUG ] Stopping the watch-stream for customresourcedefinitions.v1.apiextensions.k8s.io cluster-wide.
[2023-06-27 21:58:49,361] kopf._cogs.clients.w [DEBUG ] Stopping the watch-stream for clusterkopfpeerings.v1.kopf.dev cluster-wide.
[2023-06-27 21:58:49,363] kopf._cogs.clients.w [DEBUG ] Stopping the watch-stream for netcenterips.v1alpha1.netcenter.hpc.ethz.ch cluster-wide.
[2023-06-27 21:58:49,363] kopf._cogs.clients.w [DEBUG ] Stopping the watch-stream for services.v1 cluster-wide.
[2023-06-27 21:58:49,363] kopf._cogs.clients.w [DEBUG ] Stopping the watch-stream for ingresses.v1.networking.k8s.io cluster-wide.
[2023-06-27 21:58:49,363] kopf._core.reactor.r [DEBUG ] Daemon killer is cancelled.
[2023-06-27 21:58:49,363] kopf._core.reactor.r [DEBUG ] Resource observer is cancelled.
[2023-06-27 21:58:59,370] kopf._core.reactor.o [DEBUG ] Streaming tasks are not stopped: finishing normally; tasks left: {<Task pending name='peering keep-alive for default@None' coro=<guard() running at ./kopf/kopf/_cogs/aiokits/aiotasks.py:108> wait_for=<Future pending cb=[shield.<locals>._outer_done_callback() at /usr/lib/python3.10/asyncio/tasks.py:864, Task.task_wakeup()]>>}
[2023-06-27 21:59:09,379] kopf._core.reactor.o [DEBUG ] Streaming tasks are not stopped: finishing normally; tasks left: {<Task pending name='peering keep-alive for default@None' coro=<guard() running at ./kopf/kopf/_cogs/aiokits/aiotasks.py:108> wait_for=<Future pending cb=[shield.<locals>._outer_done_callback() at /usr/lib/python3.10/asyncio/tasks.py:864, Task.task_wakeup()]>>}
[2023-06-27 21:59:19,386] kopf._core.reactor.o [DEBUG ] Streaming tasks are not stopped: finishing normally; tasks left: {<Task pending name='peering keep-alive for default@None' coro=<guard() running at ./kopf/kopf/_cogs/aiokits/aiotasks.py:108> wait_for=<Future pending cb=[shield.<locals>._outer_done_callback() at /usr/lib/python3.10/asyncio/tasks.py:864, Task.task_wakeup()]>>}
... for a long time until finally
./run: line 8: 2849365 Killed kopf run --all-namespaces $@ ./handlers.py
Additional information
This only happens with peering enabled and when using a ConnectionInfo with expiration set.
Long story short
When using a ConnectionInfo with an expiration, which is required to work with expiring tokens, operator does not properly leave peering and does not exit.
It hangs as the updating/leaving the peering needs valid credentials but the vault ends up in a
need_reauth
state with no 'Credentials retriever' task left to populate the vault with new credentials.Kopf version
main
Kubernetes version
any
Python version
3.10.10
Code
Logs
Additional information
This only happens with peering enabled and when using a ConnectionInfo with expiration set.