kontena / pharos-cluster

Pharos - The Kubernetes Distribution
https://k8spharos.dev/
Apache License 2.0
311 stars 43 forks source link

Resource updates are not retried on conflicts #379

Closed SpComb closed 5 years ago

SpComb commented 6 years ago

Other kube operators might be modifying the resources at any time, causing our GET => PUT operations to fail with conflicts... that is normal and to be expected, and the modification should be retried. Alternatively, just re-run the entire pharos-cluster up, but that's pretty heavyweight...

==> Configure DNS @ 167.99.39.233
    [167.99.39.233] Patching kube-dns addon with 2 replicas (max-surge 1, max-unavailable 1)...
Operation cannot be fulfilled on deployments.extensions "kube-dns": the object has been modified; please apply your changes to the latest version and try again
/usr/local/bundle/gems/kubeclient-3.0.0/lib/kubeclient/common.rb:116:in `rescue in handle_exception'
/usr/local/bundle/gems/kubeclient-3.0.0/lib/kubeclient/common.rb:106:in `handle_exception'
/usr/local/bundle/gems/kubeclient-3.0.0/lib/kubeclient/common.rb:335:in `update_entity'
/app/lib/pharos/kube/client.rb:48:in `update_resource'
/app/lib/pharos/kube/resource.rb:45:in `update'
/app/lib/pharos/phases/configure_dns.rb:93:in `patch_kubedns'
/app/lib/pharos/phases/configure_dns.rb:9:in `call'
/app/lib/pharos/phase_manager.rb:70:in `block in apply'
/app/lib/pharos/phase_manager.rb:40:in `block in run_serial'
/app/lib/pharos/phase_manager.rb:39:in `map'
/app/lib/pharos/phase_manager.rb:39:in `run_serial'
/app/lib/pharos/phase_manager.rb:51:in `run'
/app/lib/pharos/phase_manager.rb:67:in `apply'
/app/lib/pharos/cluster_manager.rb:106:in `apply_phase'
/app/lib/pharos/cluster_manager.rb:87:in `apply_phases'
/app/lib/pharos/up_command.rb:94:in `configure'
/app/lib/pharos/up_command.rb:41:in `block in execute'
/app/lib/pharos/up_command.rb:40:in `chdir'
/app/lib/pharos/up_command.rb:40:in `execute'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/command.rb:63:in `run'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/subcommand/execution.rb:11:in `execute'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/command.rb:63:in `run'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/command.rb:132:in `run'
/app/lib/pharos/root_command.rb:14:in `run'
bin/pharos-cluster:12:in `<main>'
SpComb commented 6 years ago

Note that "retry" here covers the entire "GET" + "PUT" pair... the conventional approach for these kinds of modify operations would look something like:

kube.api('extensions/v1beta1').resources('Deployment', namespace: 'kube-system').modify('kube-dns') do |resource|
  resource.spec.replicas = ...
  ...
end

Performing a GET + yield to modify + PUT.

The current kube_session.resource(...).update is a bit odd, because it does a GET + merge + PUT using the partial resource data given... it could be rewritten to just use PATCH instead to avoid the need for retries, if the changes do not depend on the current state of the resource?

jakolehm commented 6 years ago

it could be rewritten to just use PATCH instead to avoid the need for retries, if the changes do not depend on the current state of the resource?

Patch acted a bit weird on some cases.

jnummelin commented 5 years ago

I think we can close this. There's now retry baked in to phases and addons, plus there's fixes in k8s-client too