kubernetes-sigs / cli-utils

This repo contains binaries that built from libraries in cli-runtime.
Apache License 2.0
154 stars 77 forks source link

Bug: Inventory updates should tolerate drift (and overwrite it) #559

Open karlkfi opened 2 years ago

karlkfi commented 2 years ago

Right now, inventory updates may return a conflict error from Kubernetes. The inventory client should detect this (apierrors.IsConflict(err)) and retry with a new Get (to update the ResourceVersion) + Update.

Example retry code:

type retriable func(ctx context.Context) (retry bool, err error)

func retryWithBackoff(ctx context.Context, timeout time.Duration, fn retriable) error {
    var err error
    var retry bool
    ctx, cancel := context.WithTimeout(ctx, timeout)
    defer cancel()
    delay := 1 + time.Second
    for {
        // attempt to update status
        retry, err = fn(ctx)
        if !retry {
            return err
        }

        // wait until delay or timeout
        timer := time.NewTimer(delay)
        select {
        case <-ctx.Done():
            timer.Stop()
            return fmt.Errorf("timed out after retrying for %v: %w", timeout, err)
        case <-timer.C:
            // continue
        }
        // retry backoff
        delay = delay * 2
    }
}

example usage:

    // attempt to update status until timeout
    ctx := context.TODO()
    timeout := 1 * time.Minute
    return retryWithBackoff(ctx, timeout, func(ctx context.Context) (retry bool, err error) {
        // Get the object to get the latest ResourceVersion.
        latestObj, err := resource.Get(ctx, obj.GetName(), metav1.GetOptions{TypeMeta: meta})
        if err != nil {
            return false, fmt.Errorf("failed to get inventory status from cluster: %w", err)
        }
        // Ignore any status changes made remotely.
        // This update will replace them.
        obj.SetResourceVersion(latestObj.GetResourceVersion())

        _, err = resource.UpdateStatus(ctx, obj, metav1.UpdateOptions{TypeMeta: meta})
        if err != nil {
            // retry if conflict
            return apierrors.IsConflict(err), fmt.Errorf("failed to write updated inventory status to cluster: %w", err)
        }
        return false, nil
    })

Another option is to use https://github.com/flowchartsman/retry which is nice and generic. gcloud and client-go also have retry libs.

karlkfi commented 2 years ago

The main client causing drift right now is the Config Sync resource-group-controller, which updates the ResourceGroup (inventory) status.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

karlkfi commented 2 years ago

/remove-lifecycle rotten /lifecycle frozen