kubernetes-sigs / gateway-api

Repository for the next iteration of composite service (e.g. Ingress) and load balancing APIs.
https://gateway-api.sigs.k8s.io
Apache License 2.0
1.81k stars 470 forks source link

GatewayAPI CRDs failed to upgrade from 0.6 to 1.0. #2671

Closed shawnho1018 closed 6 months ago

shawnho1018 commented 10 months ago

What happened: When I tried to upgrade from CRD from 0.6.0 all the way to 1.0.0, I encountered the following error

The CustomResourceDefinition "gatewayclasses.gateway.networking.k8s.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha2": must appear in spec.versions

What you expected to happen: kubectl apply should be executed correctly with the following message:

customresourcedefinition.apiextensions.k8s.io/gatewayclasses.gateway.networking.k8s.io configured

How to reproduce it (as minimally and precisely as possible): Simply first apply 0.8.0 and then 1.0.0, as following:

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/release-0.8/config/crd/standard/gateway.networking.k8s.io_gatewayclasses.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/release-1.0/config/crd/standard/gateway.networking.k8s.io_gatewayclasses.yaml

Anything else we need to know?: My K8s Cluster is v1.27.6-gke.2500

robscott commented 10 months ago

Hey @shawnho1018, thanks for reporting this! I've tried recreating this using the steps provided (and also installing a GatewayClass) and was not able to. I think the most likely cause of this would be that you have an old GatewayClass in your cluster that is stored using v1alpha2 (the API version that was defined as the storage version in the CRD the last time it was updated). This is a huge gap in our documentation, and unfortunately the relevant upstream docs seem to conflate CRD authors with users. I've filed https://github.com/kubernetes-sigs/gateway-api/issues/2672 to track that.

To fix your problem in the short term, I'd recommend no-op updates (for example an empty kubectl patch on the resource) on any Gateway API resources that may not have been modified since you upgraded to Gateway API v0.6+. (I'd expect GatewayClasses are a prime candidate for this since they're so rarely changed). You can also run https://github.com/kubernetes-sigs/kube-storage-version-migrator to automate this process for you.

shawnho1018 commented 10 months ago

@robscott Thanks for your prompt response. My earlier description (apply -f gatewayclass.yaml) was not accurate; and sorry for the confusion. Allow me to rephrase the recreation process.

  1. Create a new K8S cluster (I use GKE on-prem - GKE version 1.27.3)
  2. kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v0.6.0/standard-install.yaml
  3. Then, kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.0.0/standard-install.yaml. You would see several field immutable error . the error sample is shown below:
    Resource: "batch/v1, Resource=jobs", GroupVersionKind: "batch/v1, Kind=Job"
    Name: "gateway-api-admission-patch", Namespace: "gateway-system"
    for: "https://github.com/kubernetes-sigs/gateway-api/releases/download/v0.7.0/standard-install.yaml": error when patching "https://github.com/kubernetes-sigs/gateway-api/releases/download/v0.7.0/standard-install.yaml": Job.batch "gateway-api-admission-patch" is invalid: spec.template: Invalid value: core.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"gateway-api-admission-patch", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"batch.kubernetes.io/controller-uid":"00c4b80e-a26e-4340-a761-b7a0910c9e7b", "batch.kubernetes.io/job-name":"gateway-api-admission-patch", "controller-uid":"00c4b80e-a26e-4340-a761-b7a0910c9e7b", "job-name":"gateway-api-admission-patch", "name":"gateway-api-webhook"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:core.PodSpec{Volumes:[]core.Volume(nil), InitContainers:[]core.Container(nil), Containers:[]core.Container{core.Container{Name:"patch", Image:"registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.1.1", Command:[]string(nil), Args:[]string{"patch", "--webhook-name=gateway-api-admission", "--namespace=gateway-system", "--patch-mutating=false", "--patch-validating=true", "--secret-name=gateway-api-admission", "--patch-failure-policy=Fail"}, WorkingDir:"", Ports:[]core.ContainerPort(nil), EnvFrom:[]core.EnvFromSource(nil), Env:[]core.EnvVar{core.EnvVar{Name:"POD_NAMESPACE", Value:"", ValueFrom:(*core.EnvVarSource)(0xc02854d5e0)}}, Resources:core.ResourceRequirements{Limits:core.ResourceList(nil), Requests:core.ResourceList(nil), Claims:[]core.ResourceClaim(nil)}, ResizePolicy:[]core.ContainerResizePolicy(nil), VolumeMounts:[]core.VolumeMount(nil), VolumeDevices:[]core.VolumeDevice(nil), LivenessProbe:(*core.Probe)(nil), ReadinessProbe:(*core.Probe)(nil), StartupProbe:(*core.Probe)(nil), Lifecycle:(*core.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"IfNotPresent", SecurityContext:(*core.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]core.EphemeralContainer(nil), RestartPolicy:"OnFailure", TerminationGracePeriodSeconds:(*int64)(0xc02c8908d8), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"gateway-api-admission", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", SecurityContext:(*core.PodSecurityContext)(0xc028d1cea0), ImagePullSecrets:[]core.LocalObjectReference(nil), Hostname:"", Subdomain:"", SetHostnameAsFQDN:(*bool)(nil), Affinity:(*core.Affinity)(nil), SchedulerName:"default-scheduler", Tolerations:[]core.Toleration(nil), HostAliases:[]core.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), PreemptionPolicy:(*core.PreemptionPolicy)(nil), DNSConfig:(*core.PodDNSConfig)(nil), ReadinessGates:[]core.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), Overhead:core.ResourceList(nil), EnableServiceLinks:(*bool)(nil), TopologySpreadConstraints:[]core.TopologySpreadConstraint(nil), OS:(*core.PodOS)(nil), SchedulingGates:[]core.PodSchedulingGate(nil), ResourceClaims:[]core.PodResourceClaim(nil)}}: field is immutable

BTW, I appreciate your new ticket #2672 since that is the true upgrade problem for existing GatewayAPI users who could not use 1.0 with existing GatewayClass. #2672 is a separate issue than the one I filed here.

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

robscott commented 6 months ago

Sorry for the delayed response here. This specific error was caused because the name of the job we used to set up TLS for our validating webhook was not randomized. That resulted in an updated install trying to update a job from a previous installation which would fail. All of this is irrelevant now that we've removed the validating webhook entirely.