projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.04k stars 1.35k forks source link

Helm upgrade from 28.2 to 29.0 fails #9432

Closed Ergamon closed 3 weeks ago

Ergamon commented 3 weeks ago

I tested the upgrade now in multiple of our clusters and the result is always the same so I have to rollback.

Expected Behavior

Everything keeps up and running just with the latest version

Current Behavior

I upgrade via helm:

helm upgrade calico calico/tigera-operator -n tigera-operator

and the ApiServer pods reach the CrashLoopBackOff status.

From the logs I can see:

I1102 12:23:19.358857       1 plugins.go:157] Loaded 2 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,MutatingAdmissionWebhook.
I1102 12:23:19.358912       1 plugins.go:160] Loaded 2 validating admission controller(s) successfully in the following order: ValidatingAdmissionPolicy,ValidatingAdmissionWebhook.
E1102 12:23:19.365472       1 resource.go:106] Failed initializing client: "resource does not exist: Tier(default) with error: the server could not find the requested resource (post Tiers.crd.projectcalico.org)"

Possible Solution

I can only guess. From the logs I expect that the Api servers are responsible for creating the CRDs, but as some CRDs are already there from the previous version, it fails to add the new ones. (The old and working 28.2 installation has no Tiers objects)

Steps to Reproduce (for bugs)

  1. Install Calico 28.2 via helm
  2. Upgrade to 28.0 via helm

Your Environment

ocp1006 commented 3 weeks ago

@Ergamon I think you have a typo. You probably meant "Upgrade to 29.0 via helm". I found that applying

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: tiers.crd.projectcalico.org
spec:
  group: crd.projectcalico.org
  names:
    kind: Tier
    listKind: TierList
    plural: tiers
    singular: tier
  scope: Cluster
  versions:
  - name: v1
    schema:
      openAPIV3Schema:
        properties:
          spec:
            properties:
              order:
                type: integer
            required:
            - order
            type: object
        type: object
    served: true
    storage: true

Solved my issue, but I wonder if it's suppose to be supplied as part of calico's apiserver.yaml

Ergamon commented 3 weeks ago

@Ergamon I think you have a typo. You probably meant "Upgrade to 29.0 via helm". I found that applying

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: tiers.crd.projectcalico.org
spec:
  group: crd.projectcalico.org
  names:
    kind: Tier
    listKind: TierList
    plural: tiers
    singular: tier
  scope: Cluster
  versions:
  - name: v1
    schema:
      openAPIV3Schema:
        properties:
          spec:
            properties:
              order:
                type: integer
            required:
            - order
            type: object
        type: object
    served: true
    storage: true

Solved my issue, but I wonder if it's suppose to be supplied as part of calico's apiserver.yaml

Yes that is what I am trying.

I made a new attempt just now with pre-applying your fix.

Indeed adding the custom resource definition manually makes the Api servers start again.

But still no success in doing a complete upgrade.

While not crashing like the Api servers, the Typha pods are running but not reaching ready state.

When I take a look in the logs I can see the following:

2024-11-03 23:46:35.524 [INFO][1] typha/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/kubernetesadminnetworkpolicies" error=resource does not exist: KubernetesAdminNetworkPolicy with error: the server could not find the requested resource (get adminnetworkpolicies.policy.networking.k8s.io)

I guess something simliar like the fix for the Api servers will also help, but it should be the task of the helm chart to create these.

fasaxc commented 3 weeks ago

Did you follow the Helm upgrade steps: https://docs.tigera.io/calico/latest/operations/upgrading/kubernetes-upgrade#all-other-upgrades

In particular, this step:

kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/projectcalico/calico/v3.29.0/manifests/operator-crds.yaml

Helm doesn't manage CRDs so there's always a manual step to upgrade CRDs before updating the chart.

Ergamon commented 3 weeks ago

Did you follow the Helm upgrade steps: https://docs.tigera.io/calico/latest/operations/upgrading/kubernetes-upgrade#all-other-upgrades

In particular, this step:

kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/projectcalico/calico/v3.29.0/manifests/operator-crds.yaml

Helm doesn't manage CRDs so there's always a manual step to upgrade CRDs before updating the chart.

You make me now feel really dumb. As no upgrade since 3.23.0 did need any kind of manual action, I just did helm upgrade like I did with any other upgrade I did in the last years.

I didnt even look on the documentation.

Works like a charm. Thank you very much