kontena / k8s-client

Ruby Kubernetes API client
Apache License 2.0
76 stars 26 forks source link

Using watches and handling timeouts #92

Closed Eric-Fontana-Bose closed 5 years ago

Eric-Fontana-Bose commented 5 years ago

I've been using https://github.com/abonas/kubeclient library, but I like the flavor of this library better. I'm a heavy user of the watch API, and when testing out this code:

 begin
    MyLog.log.info "Starting watcher..."
    client.api('v1').resource('namespaces').watch() do |watch_event|
      o = watch_event[:object]
      puts "type=#{watch_event.type} namespace=#{watch_event.resource.metadata.name}"
    end
    MyLog.log.info "Exited normally."
  rescue Exception => e
    MyLog.log.error "Watcher error: #{e}"
  end

  MyLog.log.info "Finished watcher..."

After roughly 5 minutes the exception handler reports:

ERROR -- : Watcher error: end of file reached (EOFError)

I sort of expected this, if you issue kubectl namespaces -w it will timeout in about the same amount of time.

We are stuck on an older version of Kubernetes (1.89) and the Kube API server was getting hammered because the abonas client was not handling the closing of the terminated connection and it was causing the API server to backup and affect the cluster.

What is the proper way to handle the timeout/EOFError ?

jakolehm commented 5 years ago

I have not seen EOFError with recent k8s-client/kubernetes versions.. What if you set timeout, for example: watch(timeout: 600).

Eric-Fontana-Bose commented 5 years ago

Changing the timeout does not help, it is the Kube API server which terminates the connection.

jakolehm commented 5 years ago

@kke any ideas?

jnummelin commented 5 years ago

Why not just catch EOFError and restart the watch?

K8s-client could of course handle that automatically internally...

kke commented 5 years ago

If this is still an issue, feel free to reopen / report again.

vitobotta commented 5 years ago

Hi, I am playing with the watch feature. I got it working for my use case but the code quits automatically after a while. What is the best way of keeping it alive? Thanks

cben commented 5 years ago

[brain dump, not sure all relevant for k8s-client, HTH]

There are several scenarios for restarting watches.

I suspect a client lib could automatically retry from last seen version, but when that's too old it better surface the error to caller ?

Compare python client discussion https://github.com/kubernetes-client/python/issues/972, https://github.com/kubernetes-client/python-base/pull/133. Specifically comment about official Go client https://github.com/kubernetes-client/python-base/pull/133#discussion_r309825984

vitobotta commented 5 years ago

Hi @cben ! How do I restart from the last seen resourceVersion? At the moment I am first saving the list of the existing resources (Velero backups) in an array, and then when the watch starts I ignore those backups that already existed. But it sounds like it would be much better if I could restart the watcher from where it left. How can I do this with this gem? Thanks!

vitobotta commented 5 years ago

OK I see that I can set the resourceVersion as parameter for the watch method. But what value do I specify initially, so that I don't just get the whole list of the existing resources?

jnummelin commented 5 years ago

@vitobotta Not really sure what you are trying to achieve, but usually these are handled with something like:

last_seen_resource = 0
begin
  client.api('v1').resource('pods', namespace: 'default').watch(resourceVersion: last_seen_resource) do |watch_event|
    puts "type=#{watch_event.type} pod=#{watch_event.resource.metadata.name}"
    last_seen_resource = watch_event.metadata.resourceVersion
  end
rescue  EOFError # or something bit more specific maybe :)
  retry  # makes the watch start again from last seen resource
end

So yes, initially when your app starts, you need to get all the resources through watch. If you are "syncing" the status with something external, your app needs to decide what to do in case the resource has been already seen in the past and possibly exists in the external thingy.

vitobotta commented 5 years ago

I ended up doing a list first, getting the max resource version and using that. Seems to work. Thanks! :)

cben commented 5 years ago

Small correction: If doing a list, you should use the whole list's resourceVersion.

kubernetes devs are pretty adamant that resourceVersion "MUST be treated as opaque" string. While so far it's been a number, you shouldn't assume that, shouldn't interpret it, and thus can't compute "max" of several versions. But that's why any FooList response, in addition to each item having a resourceVersion, also has a top-level resourceVersion — use that for initial watch. (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md)


But your instinct was a good one, pointing to a subtle distinction between list and watch:

vitobotta commented 5 years ago

Hi @cben ! How do I get the resourceVersion for the list? I tried with resource_version = k8s_client.api("velero.io/v1").resource(resource_type.to_s, namespace: velero_namespace).list.resourceVersion but it gives me undefined method for the array. Thanks

vitobotta commented 5 years ago

Found it! It's meta_list.metadata.resourceVersion isn't it? I made that change and it seems to work, it no longer returns the existing events when I start the watch and only returns new events. Thanks! :)

cben commented 5 years ago

Yes, that's the one I meant.