hashicorp / terraform-cloud-operator

Kubernetes Operator allows managing HCP Terraform resources via Kubernetes Custom Resources.
https://developer.hashicorp.com/terraform/cloud-docs
Mozilla Public License 2.0
114 stars 27 forks source link

v1 to v2 migration: Confirmation around CRD patching #411

Open nabadger opened 1 month ago

nabadger commented 1 month ago

Operator Version, Kind and Kubernetes Version

YAML Manifest File

n/an

Output Log

Kubectl Outputs

Question

Regarding https://developer.hashicorp.com/terraform/cloud-docs/integrations/kubernetes/ops-v2-migration it suggests that once we've migrated our v1 workspace resources, we should apply a patch the CRD

kubectl patch crd workspaces.app.terraform.io --patch-file workspaces_patch_b.yaml

Would it be possible to clarify the following:

  1. During the helm install of the v2 operator, it's possible that we bring in the updated CRDs. Do the patches conflict with this, or should we explcitly not bring in the Workspace CRD update during the helm-install? Our provisioning tools would allow for this if required (we use Jsonnet + Helm so can patch as required).
  2. Is patch-b required, or could we instead just install the CRD as defined in the helm chart? I think the end goal here would be to align with the helm chart version

Thanks

References

Community Note

arybolovlev commented 1 month ago

Hi @nabadger,

Helm does not upgrade CRDs when the upgrade operation is performed. That is why if you have v1 installed and then install v2 via Helm, it will only add Module, AgentPool, and Project CRDs.

If you install v2 before applying patch A, you will see in v2 logs that it cannot watch v1alpha2 resources:

2024-05-24T13:48:37Z    ERROR   controller-runtime.source.EventHandler  if kind is a CRD, it should be installed before calling Start   {"kind": "Workspace.app.terraform.io", "error": "no matches for kind \"Workspace\" in version \"app.terraform.io/v1alpha2\""}

And the pod will eventually restart:

2024-05-24T13:50:27Z    ERROR   setup   problem running manager {"error": "failed to wait for workspace caches to sync: timed out waiting for cache to be synced for Kind *v1alpha2.Workspace"}

Patch A updates existing workspaces.app.terraform.io CRD by adding v1alpha2 support:

From:

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: workspaces.app.terraform.io
spec:
  versions:
    - name: v1alpha1
      served: true
      storage: true

To:

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: workspaces.app.terraform.io
spec:
  versions:
    - name: v1alpha1
      served: false
      storage: false
      ...
    - name: v1alpha2
      served: true
      storage: true
      ...

There are two crucial fields under each version, served and storage:

As you can see, v1alpha2(v2) has storage set to true. It means that once you apply patch A, Kubernetes will update existing objects in etcd to be compatible with the v1alpha2 schema. That is the reason why some fields will disappear from CRs and manual intervention will be required. Because of that, v1alpha1 has served set to false, required fields are not in the CR anyway.

At this point, the v1 operator cannot serve CRs.

The only difference between patches A and B is that patch A does not have status.runStatus included under the v1aplha2 version, this is due to its incompatibility with the same field in v1aplha1. Once the v2 operator runs first reconciliation, it will update resource statuses and at this point patch B can be applied to bring fully compatible with v2 schema.

At this point, we don't need v1alpha1 anymore and technically we can remove it from the CRD.

In short, I think if you replace CRD instead of applying patch B you will get the same result, but I never tried this. :-)

I will try it in my lab and let you know the result, but in any way, I would strongly recommend testing in out first in a lab environment.

Thanks!

nabadger commented 1 month ago

Thanks - in my test stack I was able to go from patch-a to the helm version of the crd (will do some more checks).

I did hit an issue but I suspect this would also be the case if going from patch-b.

When attempting to apply the latest helm version of the CRD I got

CustomResourceDefinition.apiextensions.k8s.io "workspaces.app.terraform.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions

The status of the Workspace CRD was:

  acceptedNames:
    kind: Workspace
    listKind: WorkspaceList
    plural: workspaces
    singular: workspace
  conditions:
  - lastTransitionTime: "2024-05-23T13:38:24Z"
    message: no conflicts found
    reason: NoConflicts
    status: "True"
    type: NamesAccepted
  - lastTransitionTime: "2024-05-23T13:38:24Z"
    message: the initial names have been accepted
    reason: InitialNamesAccepted
    status: "True"
    type: Established
  storedVersions:
  - v1alpha1
  - v1alpha2

Note that all resources in my cluster were updated to use v1alpha2 (v2 operator) prior to trying to update the latest CRD.

I've read about this issue on other operators (i.e. https://github.com/elastic/cloud-on-k8s/issues/2196#issuecomment-560294302 )

In order to get around this I removed v1alpha from the storedVersions using kubectl edit crd workspaces.app.terraform.io --subresource status

vadim-kubasov commented 2 weeks ago

In my case, I resolved the issue with next steps:

CustomResourceDefinition.apiextensions.k8s.io "workspaces.app.terraform.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions

  1. I edited the file via kubectl edit crd workspaces.app.terraform.io and replace v1alpha1 to v1alpha2

From:

  storedVersions:
  - v1alpha1

To:

  storedVersions:
  - v1alpha2
  1. After that, I updated my crd via command
    $ kubectl replace -f https://raw.githubusercontent.com/hashicorp/terraform-cloud-operator/v2.4.1/charts/terraform-cloud-operator/crds/app.terraform.io_workspaces.yaml

And workspace crd was finally updated