neondatabase / autoscaling

Postgres vertical autoscaling in k8s
Apache License 2.0
153 stars 21 forks source link

neonvm-runner-image-loader: Faster rollout #923

Closed sharnoff closed 5 months ago

sharnoff commented 5 months ago

Specifically, this PR sets maxUnavailable = 100%, which allows updating all replicas in parallel.

An issue we see during rollout is that it often takes 30+ seconds to create a new neonvm-runner-image-loader on a node, and doing this for many nodes takes an unreasonable amount of time. This improves that.

cicdteam commented 5 months ago

@sharnoff

why it hardcoded to 3 ? maxUnavalable and maxSurge could be percentage, like

  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 100%
      maxSurge: 100%

if we have 10 pods then on rollout 10 old pods will be stopped and 10 new pods will be stared at the moment.

sharnoff commented 5 months ago

why it hardcoded to 3 ?

Yeah, I considered making it a percentage, but I'm not sure if doing a simultaneous image pull cluster-wide could potentially cause issues, so better to be safe.

If you think it's definitely ok, I'll bump it higher, if +1 from @Omrigan.

Omrigan commented 5 months ago

why it hardcoded to 3 ?

Yeah, I considered making it a percentage, but I'm not sure if doing a simultaneous image pull cluster-wide could potentially cause issues, so better to be safe.

If you think it's definitely ok, I'll bump it higher, if +1 from @Omrigan.

I don't see the reason why 100% could cause issues.

sharnoff commented 5 months ago

Fair enough, went with 100%

sharnoff commented 4 months ago

After trying it out, 100% was definitely the right move :)