Closed Eric-Fontana-Bose closed 5 years ago
I have not seen EOFError
with recent k8s-client/kubernetes versions.. What if you set timeout, for example: watch(timeout: 600)
.
Changing the timeout does not help, it is the Kube API server which terminates the connection.
@kke any ideas?
Why not just catch EOFError
and restart the watch?
K8s-client could of course handle that automatically internally...
If this is still an issue, feel free to reopen / report again.
Hi, I am playing with the watch feature. I got it working for my use case but the code quits automatically after a while. What is the best way of keeping it alive? Thanks
[brain dump, not sure all relevant for k8s-client, HTH]
There are several scenarios for restarting watches.
you can restart from last seen resourceVersion. This probably results in exactly-once delivery, most of the time.
When that collection doesn't change much, OR when you resume significant time later, the server might refuse to resume from that resourceVersion (iirc the window depends on etcd version, 5min / 1000 events but might include events from other collections). I think that gives you 410 Gone http status (?) Now you have 2 choices:
Watch from current moment, without specifying resourceVersion. There will be an unknown gap you've missed.
Get/List and watch from fresh resourceVersion. Again there will be a gap, but you'll get a fresh state to track from...
I suspect a client lib could automatically retry from last seen version, but when that's too old it better surface the error to caller ?
Compare python client discussion https://github.com/kubernetes-client/python/issues/972, https://github.com/kubernetes-client/python-base/pull/133. Specifically comment about official Go client https://github.com/kubernetes-client/python-base/pull/133#discussion_r309825984
Hi @cben ! How do I restart from the last seen resourceVersion? At the moment I am first saving the list of the existing resources (Velero backups) in an array, and then when the watch starts I ignore those backups that already existed. But it sounds like it would be much better if I could restart the watcher from where it left. How can I do this with this gem? Thanks!
OK I see that I can set the resourceVersion as parameter for the watch method. But what value do I specify initially, so that I don't just get the whole list of the existing resources?
@vitobotta Not really sure what you are trying to achieve, but usually these are handled with something like:
last_seen_resource = 0
begin
client.api('v1').resource('pods', namespace: 'default').watch(resourceVersion: last_seen_resource) do |watch_event|
puts "type=#{watch_event.type} pod=#{watch_event.resource.metadata.name}"
last_seen_resource = watch_event.metadata.resourceVersion
end
rescue EOFError # or something bit more specific maybe :)
retry # makes the watch start again from last seen resource
end
So yes, initially when your app starts, you need to get all the resources through watch. If you are "syncing" the status with something external, your app needs to decide what to do in case the resource has been already seen in the past and possibly exists in the external thingy.
I ended up doing a list first, getting the max resource version and using that. Seems to work. Thanks! :)
Small correction: If doing a list, you should use the whole list's resourceVersion
.
kubernetes devs are pretty adamant that resourceVersion
"MUST be treated as opaque" string. While so far it's been a number, you shouldn't assume that, shouldn't interpret it, and thus can't compute "max" of several versions. But that's why any FooList
response, in addition to each item having a resourceVersion, also has a top-level resourceVersion
— use that for initial watch.
(https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md)
But your instinct was a good one, pointing to a subtle distinction between list and watch:
Hi @cben ! How do I get the resourceVersion for the list? I tried with resource_version = k8s_client.api("velero.io/v1").resource(resource_type.to_s, namespace: velero_namespace).list.resourceVersion
but it gives me undefined method for the array. Thanks
Found it! It's meta_list.metadata.resourceVersion
isn't it? I made that change and it seems to work, it no longer returns the existing events when I start the watch and only returns new events. Thanks! :)
Yes, that's the one I meant.
I've been using https://github.com/abonas/kubeclient library, but I like the flavor of this library better. I'm a heavy user of the watch API, and when testing out this code:
After roughly 5 minutes the exception handler reports:
ERROR -- : Watcher error: end of file reached (EOFError)
I sort of expected this, if you issue
kubectl namespaces -w
it will timeout in about the same amount of time.We are stuck on an older version of Kubernetes (1.89) and the Kube API server was getting hammered because the abonas client was not handling the closing of the terminated connection and it was causing the API server to backup and affect the cluster.
What is the proper way to handle the timeout/EOFError ?