Cannot apply CRD and a CR using it in the same plan/apply due to SSA

heschlie commented 3 years ago

Terraform version, Kubernetes provider version and Kubernetes version

Terraform version: 0.14.11
Kubernetes Provider version: 2.4.1
Kubernetes version: EKS 1.17

Terraform configuration

There is a bit going on here, but essentially this is the output from the terraform_flux_provder, and through some HCL abuse I'm massaging it into the right format.

resource "kubernetes_manifest" "install" {
  for_each   = { for manifest in local.install_manifest : join("-", [manifest.kind, manifest.metadata.name]) => manifest }
  depends_on = [kubernetes_namespace.flux_system]
  manifest   = each.value
}

resource "kubernetes_manifest" "sync" {
  for_each   = { for manifest in local.sync_manifest : join("-", [manifest.kind, manifest.metadata.name]) => manifest }
  depends_on = [kubernetes_manifest.install]
  manifest   = each.value
}

Question

Essentially I am using the kubernetes_manifest resource, and am trying to:

Deploy some custom resource definitions
Deploy some custom resources using the above definitions

Upon doing this I am greeted with an error during the plan because the CRDs have not been created and SSA is not happy about it:

Acquiring state lock. This may take a few moments...

Error: Failed to determine GroupVersionResource for manifest

  on main.tf line 49, in resource "kubernetes_manifest" "sync":
  49: resource "kubernetes_manifest" "sync" {

no matches for kind "Kustomization" in group "kustomize.toolkit.fluxcd.io"

Error: Failed to determine GroupVersionResource for manifest

  on main.tf line 49, in resource "kubernetes_manifest" "sync":
  49: resource "kubernetes_manifest" "sync" {

no matches for kind "GitRepository" in group "source.toolkit.fluxcd.io"

Releasing state lock. This may take a few moments...
ERRO[0105] Hit multiple errors:
Hit multiple errors:
exit status 1

Is there a way to tell the provider that things are ok, and not try to plan this? It seems like a bug or required feature before this comes out of experimental, as asking for someone to first apply the CRDs, then add and apply the CRs doesn't seem like a valid long term solution.

charlierm commented 5 months ago

Any update on this at all? It means we're having to put everything in seperate stacks.

pepsipu commented 4 months ago

this sent me back to pulumi 😭

atheiman commented 4 months ago

@alexsomesan could the kubernetes provider include a second resource (lets just call it kubernetes_manifest_functional) that does not make all the plan promise guarantees you describe but does satisfy all the use cases defined here?

YarekTyshchenko commented 4 months ago

@alexsomesan could the kubernetes provider include a second resource (lets just call it kubernetes_manifest_functional) that does not make all the plan promise guarantees you describe but does satisfy all the use cases defined here?

But this defeats the point as you'd basically never be able to use the non functional version.

IMO most of the "errors" should be folded into manifest not existing:

No connection to cluster
No CRD
No resource instance

Edit: I of course agree that this is best solved inside Terraform itself, but if wishes were fishes.

fabn commented 4 months ago

Really, after years we still have this issue?

Why this wouldn't work?

May have been said already but another potential way of solving this would be to:

Allow a resource to dictate that it provides X CRD.

Then have terraform just trust the provides tag until apply phase.

If during apply, after applying the provider, the resource does not exist, then fail.

Does this make sense?

gagbo commented 4 months ago

So meanwhile the only way to resolve the issue is to comment out the kubernetes_manifest that rely on CRDs, and then apply, and then uncomment the manifest and reapply? Not sure I’m understanding this correctly

alekc commented 4 months ago

That, or split roll out in different stages (first card install, then manifests using it), or to use alekc/kubectl provider as alternative

On Tue, 25 Jun 2024, 15:07 Gerry Agbobada, @.***> wrote:

So meanwhile the only way to resolve the issue is to comment out the kubernetes_manifest that rely on CRDs, and then apply, and then uncomment the manifest and reapply? Not sure I’m understanding this correctly

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-kubernetes/issues/1367#issuecomment-2189056927, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACJ5V7XLGBL7PRMLYBS5KDZJF2QFAVCNFSM5CB67J7KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJYHEYDKNRZGI3Q . You are receiving this because you are subscribed to this thread.Message ID: @.*** .com>

heschlie commented 4 months ago

Not so far off from the 3 year mark when I opened this issue. Even back then we were opposed to deploying k8s resources via terraform but deploying flux with it to help bootstrap the cluster seemed like a good idea, this was before flux offered a provider to do the bootstrapping, we migrated to that once it was available IIRC.

I suppose the lesson here is don't manage your k8s resources in TF unless you absolutely have to as, and this is probably not a surprise to most folks, how k8s manages its resources isn't very compatible with TF.

alekc commented 4 months ago

It has nothing to do with k8s and all to do with how the terraform kubernetes provider has been written. By the choice, it prepares the plan for all entries right at the beginning, thus requiring CRD to be present. Some other providers are using dynamic binding, so they can overcome this particular problem.

On Tue, 25 Jun 2024 at 17:10, Stephen Schlie @.***> wrote:

Not so far off from the 3 year mark when I opened this issue. Even back then we were opposed to deploying k8s resources via terraform but deploying flux with it to help bootstrap the cluster seemed like a good idea, this was before flux offered a provider to do the bootstrapping, we migrated to that once it was available IIRC.

I suppose the lesson here is don't manage your k8s resources in TF unless you absolutely have to as, and this is probably not a surprise to most folks, how k8s manages its resources isn't very compatible with TF.

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-kubernetes/issues/1367#issuecomment-2189372022, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACJ5V2EEF6SCJ6LYGWBK4TZJGJAFAVCNFSM5CB67J7KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJYHEZTOMRQGIZA . You are receiving this because you are subscribed to this thread.Message ID: @.*** .com>

sdemjanenko commented 3 months ago

Reflecting on the previous comment here, I propose that Terraform could benefit from supporting multiple plan + apply cycles. Specifically, if Terraform detects that a Custom Resource (CR) cannot be planned because its CRD isn’t yet installed, it could defer the CR to a subsequent cycle, provided that it can still make progress in the current cycle.

During the apply phase, Terraform would only execute the actions from the completed plan and then indicate if additional cycles are needed.

This approach would be more efficient than the current workarounds, which involve separating resources into different sets of Terraform files, or temporarily commenting out parts of the code or using conditional logic to manage resource dependencies. Such enhancements would streamline the development process, allowing Terraform to handle resource ordering more intuitively and reducing the manual effort required to ensure resources are applied in the correct order.

Update:

It looks like there is some active work in Terraform to this end: https://github.com/hashicorp/terraform/issues/30937

voronin-ilya commented 2 months ago

The best workaround I've found is to package the custom resources YAMLs into a small Helm chart, bundle it with Terraform module code, and then install it using the helm_release resource:

resource "helm_release" "custom_resources" {
  name  = "custom_resources"
  chart = "${path.module}/custom_resources"

  depends_on = [
    helm_release.crds
  ]
}

swissbuechi commented 2 months ago

The best workaround I've found is to package the custom resources YAMLs into a small Helm chart, bundle it with Terraform module code, and then install it using the helm_release resource:
resource "helm_release" "custom_resources" {
  name  = "custom_resources"
  chart = "${path.module}/custom_resources"

  depends_on = [
    helm_release.crds
  ]
}

I did exactly the same thing for the cert-manager clusterissuer.

cert-manager.tf
charts/cert-manager-clusterissuer
├── Chart.yaml
└── templates/
    └── cluster_issuer_prod.yaml
    └── cluster_issuer_staging.yaml

cert-manager.tf:

locals {
  solvers_ingress_class_name = "ingress-nginx"
}

resource "kubernetes_namespace_v1" "cert_manager" {
  metadata {
    name = "cert-manager"
  }
}

resource "helm_release" "cert_manager" {
  name        = kubernetes_namespace_v1.cert_manager.metadata.0.name
  repository  = "https://charts.jetstack.io"
  chart       = kubernetes_namespace_v1.cert_manager.metadata.0.name
  version     = var.cert_manager_helm_version
  namespace   = kubernetes_namespace_v1.cert_manager.metadata.0.name
  max_history = 1
  set {
    name  = "installCRDs"
    value = "true"
  }
}

resource "helm_release" "cert_manager_clusterissuer" {
  name        = "cert-manager-clusterissuer"
  chart       = "${path.module}/charts/cert-manager-clusterissuer"
  max_history = 1
  set {
    name  = "acme_email"
    value = var.acme_email
  }
  set {
    name  = "solvers_ingress_class_name"
    value = local.solvers_ingress_class_name
  }
  depends_on = [
    helm_release.cert_manager
  ]
}

Chart.yaml:

apiVersion: v2
name: cert-manager-clusterissuer
version: 0.1.0

cluster_issuer_prod.yaml:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: {{ .Values.acme_email }}
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          ingressClassName: {{ .Values.solvers_ingress_class_name }}

cluster_issuer_prod.yaml:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: {{ .Values.acme_email }}
    privateKeySecretRef:
      name: letsencrypt-staging
    solvers:
    - http01:
        ingress:
          ingressClassName: {{ .Values.solvers_ingress_class_name }}

hashicorp / terraform-provider-kubernetes