pulumi / pulumi-kubernetes

A Pulumi resource provider for Kubernetes to manage API resources and workloads in running clusters
https://www.pulumi.com/docs/reference/clouds/kubernetes/
Apache License 2.0
407 stars 117 forks source link

Increase retry count on Custom Resources waiting for CRDs #1446

Open vyrwu opened 3 years ago

vyrwu commented 3 years ago

When installing the kube-prometheus-stack chart with custom Prometheus Rules using v3.Helm library (TypeScript), some Custom Resources timeout with the following error:

authorizer.rules (iac:services:KubePrometheusStack$kubernetes:monitoring.coreos.com/v1:PrometheusRule)
Retry #0; creation failed: no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"

Retry pulumi/pulumi#1; creation failed: no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"

Retry pulumi/pulumi#2; creation failed: no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"

Retry pulumi/pulumi#3; creation failed: no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"

Retry pulumi/pulumi#4; creation failed: no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"

Retry pulumi/pulumi#5; creation failed: no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"

error: creation of resource cluster-services/authorizer.rules failed because the Kubernetes API server reported that the apiVersion for this resource does not exist. Verify that any required CRDs have been created: no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"

It seems like they can't find the corresponding CRD, which would be created and found if Pulumi just waited a few minutes, as there's a lot of different K8s resources being created as part of this deployment. I've tried adding a custom timeout of 30 minutes via a transformation on the Helm chart - however without success (the error log above was thrown after 7 minutes of build time):

transformations.push((obj: any, opts: CustomResourceOptions) => {
        if (obj.apiVersion === 'monitoring.coreos.com/v1' && obj.kind === 'PrometheusRule') {
            opts.customTimeouts = {
                create: '30m', <= Pulumi does not respect this in this scenario
            }
        }
    })

I want to tell Pulumi to keep retrying more than 5 times, because I know that the CRD will eventually be found. Does anyone have a suggestion on how to tackle this?

vyrwu commented 3 years ago

@mikhailshilkov I would consider this a bug, not an enhancement. This prevents me from deploying that chart using Pulumi.

eduanb commented 3 years ago

I have the same issue with cert-manager. I've tried to do a dependsOn but that also doesn't seem to work.

mremes commented 9 months ago

@eduanb did you manage to work around this? CRDs don't get created in time before Issuers are created.

eduanb commented 9 months ago

Hi @mremes I can't remember if I ever found a solution. I would suggest trying server-side apply https://www.pulumi.com/registry/packages/kubernetes/how-to-guides/managing-resources-with-server-side-apply/

KristapsT commented 3 months ago

Encountered the same issue with version 4.15.0, which already has server-side apply enabled by default, so just having server-side apply doesn't fix this.

As a workaround, I ended up configuring two Kubernetes providers, one with server-side apply, the other with client-side apply provider. Then use server-side apply for CRD creation and client-side apply for custom resource creation, something like this:

import * as k8s from "@pulumi/kubernetes";

// server-side provider
const k8sProviderSSA = new k8s.Provider("aksK8sProviderSSA", {
  enableServerSideApply: true,
});

// client-side provider
const k8sProviderCSA = new k8s.Provider("aksK8sProviderCSA", {
  enableServerSideApply: false,
});

// create crds
const crds = new k8s.yaml.ConfigFile(
  "crds",
  { file: "./crds.yaml" },
  { provider: k8sProviderSSA }
);

// create custom resources using crds
const customResources = new k8s.yaml.ConfigFile(
  "customResources",
  { file: "./customResources.yaml" },
  {
    dependsOn: [crds],
    provider: k8sProviderCSA,
  }
);

Got rid of the problem for me, custom resources still take 1 or 2 tries to get created, at least it no longer fails entirely.