Closed yorik closed 1 month ago
Perhaps this is a bit counter-intuitive, but watcher::Error is an error type that we mostly propagate for logging and our own recovery. The errors may occur, but the errors in that enum are generally recoverable.
The problem is your unwrap()
. You can run it with a for_each
(as e.g. this) and it should recover (the watcher will relist from rv=0).
Hey, there are really several separate issues going on here:
stream.try_next().await.unwrap()
This will panic on the first error received. watcher
is designed to retry in the face of errors, but will still propagate them to you as well. However, unwrap()
will cause it to panic (and implicitly terminate the watch due to unwinding).
Instead, you could do something like:
while let Some(event) = stream.try_next().await {
match event {
Ok(event) => println!(
"Event in namespace {}: reason: {:?}, message: {:?}",
ns, event.reason, event.message
),
Err(err) => eprintln!("Error during watch: {err:?}"),
}
}
When reconnecting, watcher
will try to pick up the stream based on the timestamp (resourceVersion
) of last message it received (resourceVersion
).
However, Kubernetes only keeps the replay cache for so long, and will reject attempts to reconnect if the resourceVersion
is too old. This can happen if the newest message is older than the size of the replay cache. kube tries to work around this using BOOKMARK
messages (https://docs.rs/kube/latest/kube/runtime/watcher/struct.Config.html#structfield.bookmarks), which requests that the API server should send periodic empty messages with just the latest timestamp during idle periods. It may be worth investigating whether these bookmark messages are not being sent/interpreted for some reason.
I didn't expect that I should just continue stream.try_next()
, but it works!
Thank you so much for the explanation!
Maybe https://github.com/kube-rs/kube/blob/main/examples/event_watcher.rs#L48 should be updated with the match?
Updated the example in #1616.
Thank you! Looks good to me.
Current and expected behavior
When I'm trying to watch k8s events, watcher exits with error
ErrorResponse { status: "Failure", message: "The resourceVersion for the provided watch is too old.", reason: "Expired", code: 410 }
instead of re-trying connection.Looks like this happens only for Event and probably when there is nothing going on in a namespace. The namespace from the example has bunch of deployments, but all of them with 0 replicas.
Possible solution
No response
Additional context
Code to prove the bug:
Log before exits:
Environment
GKE Debian sid
Configuration and features
Affected crates
kube-runtime
Would you like to work on fixing this bug?
maybe