operator-framework / java-operator-sdk

Java SDK for building Kubernetes Operators
https://javaoperatorsdk.io/
Apache License 2.0
789 stars 212 forks source link

409 Conflict When Reconciling Pod with Sidecar Injection #2076

Closed coltmcnealy-lh closed 6 months ago

coltmcnealy-lh commented 11 months ago

Bug Report

Reconciling a Pod with a CRUDKubernetesDependentResource yields a 409 after sidecar injection via MutatingAdmissionWebhook.

What did you do?

Our operator deals with Pods manually through a CRUDKubernetesDependentResource (bulk dependent, to be precise). We are testing in an Istio-enabled cluster which injects a sidecar into the pod through an admission webhook.

What did you expect to see?

I expected to see the operator create the pod, the sidecar get injected, and then since I didn't change any spec on the pod itself in the desiredResources() method, future reconciliation loops should not try to remove any containers.

What did you see instead? Under which circumstances?

We saw a 409 error because the JOSDK tried to apply the spec we provided without the injected sidecar container and init container. The stacktrace is shown below.

16:56:37 ERROR [LH] LHClusterReconciler - Unexpected error on LHCluster basic-tls
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://127.0.0.1:42795/api/v1/namespaces/littlehorse/pods/basic-tls-server-2?fieldManager=lhclusterreconciler&force=true. Message: Pod "basic-tls-server-2" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds`, `spec.tolerations` (only additions to existing tolerations) or `spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
  core.PodSpec{
    Volumes:        {{Name: "workload-socket", VolumeSource: {EmptyDir: &{}}}, {Name: "credential-socket", VolumeSource: {EmptyDir: &{}}}, {Name: "workload-certs", VolumeSource: {EmptyDir: &{}}}, {Name: "istio-envoy", VolumeSource: {EmptyDir: &{Medium: "Memory"}}}, ...},
    InitContainers: {{Name: "istio-init", Image: "docker.io/istio/proxyv2:1.16.

// TRUNCATED

            },
            StartupProbe: nil,
            Lifecycle:    nil,
            ... // 7 identical fields
        },
        {Name: "istio-proxy", Image: "docker.io/istio/proxyv2:1.16.2", Args: {"proxy", "sidecar", "--domain", "$(POD_NAMESPACE).svc.cluster.local", ...}, Ports: {{Name: "http-envoy-prom", ContainerPort: 15090, Protocol: "TCP"}}, ...},
    },
    EphemeralContainers: nil,
    RestartPolicy:       "Always",
    ... // 26 identical fields
  }
, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
    at io.fabric8.kubernetes.client.KubernetesClientException.copyAsCause(KubernetesClientException.java:238) ~[kubernetes-client-api-6.7.2.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:518) ~[kubernetes-client-6.7.2.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:535) ~[kubernetes-client-6.7.2.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handlePatch(OperationSupport.java:430) ~[kubernetes-client-6.7.2.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handlePatch(OperationSupport.java:408) ~[kubernetes-client-6.7.2.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handlePatch(BaseOperation.java:713) ~[kubernetes-client-6.7.2.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.lambda$patch$2(HasMetadataOperation.java:232) ~[kubernetes-client-6.7.2.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:237) ~[kubernetes-client-6.7.2.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:252) ~[kubernetes-client-6.7.2.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.serverSideApply(BaseOperation.java:1132) ~[kubernetes-client-6.7.2.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.serverSideApply(BaseOperation.java:92) ~[kubernetes-client-6.7.2.jar:?]
    at io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependentResource.update(KubernetesDependentResource.java:153) ~[operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.dependent.kubernetes.CRUDKubernetesDependentResource.update(CRUDKubernetesDependentResource.java:16) ~[operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.dependent.BulkDependentResourceReconciler$BulkDependentResourceInstance.update(BulkDependentResourceReconciler.java:92) ~[operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.dependent.AbstractDependentResource.handleUpdate(AbstractDependentResource.java:143) ~[operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.dependent.AbstractDependentResource.reconcile(AbstractDependentResource.java:72) ~[operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.dependent.BulkDependentResourceReconciler.lambda$reconcile$0(BulkDependentResourceReconciler.java:39) ~[operator-framework-core-4.4.4.jar:?]
    at java.util.HashMap.forEach(HashMap.java:1421) ~[?:?]
    at io.javaoperatorsdk.operator.processing.dependent.BulkDependentResourceReconciler.reconcile(BulkDependentResourceReconciler.java:35) ~[operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.dependent.AbstractDependentResource.reconcile(AbstractDependentResource.java:52) ~[operator-framework-core-4.4.4.jar:?]
    at io.littlehorse.operator.cluster.LHClusterReconciler.reconcile(LHClusterReconciler.java:134) [app-0.5.0.jar:?]
    at io.littlehorse.operator.cluster.LHClusterReconciler.reconcile(LHClusterReconciler.java:41) [app-0.5.0.jar:?]
    at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:152) ~[operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:110) ~[operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.api.monitoring.Metrics.timeControllerExecution(Metrics.java:219) ~[operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:109) ~[operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:140) [operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:121) [operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91) [operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64) [operator-framework-core-4.4.4.jar:?]
    at io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:417) [operator-framework-core-4.4.4.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://127.0.0.1:42795/api/v1/namespaces/littlehorse/pods/basic-tls-server-2?fieldManager=lhclusterreconciler&force=true. Message: Pod "basic-tls-server-2" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds`, `spec.tolerations` (only additions to existing tolerations) or `spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
  core.PodSpec{

Environment

KIND cluster, JOSDK 4.4.+.

csviri commented 11 months ago

Hi @coltmcnealy-lh thx for the bug report, will have to take a look how to solves this in a generic way. If you could provide an a simplified project to reproduce would help a lot. Until switching from to a non-ssa update (maybe would need a custom matcher, but not sure) would fix this. (Note that in the upcoming release ssa handling is switchable off per dependent resource)

coltmcnealy-lh commented 11 months ago

Thanks @csviri. I am traveling this week so I will create a minimal reproducible example (istioctl and kind as dependencies) on Saturday morning.

coltmcnealy-lh commented 11 months ago

@csviri I think this may actually be a layer-8 issue. I over-wrote this method:

public Result<Pod> match(Pod actualResource, Pod desired, LHCluster lhc, Context<LHCluster> ctx) {}

For various reasons, we have to store the "last-applied spec" in the annotations. I just compared the annotations in actualResource and desired. Now things appear to work as expected.

I also put together a simple example in my own github that just creates a simple hello-world NGINX pod. That one actually worked just fine. But it didn't have the crazy annotation-based processing (we do this internally because we needed to do a rolling restart and smooth scaledown without using a statefulset or similar).

github-actions[bot] commented 9 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] commented 7 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] commented 6 months ago

This issue was closed because it has been stalled for 14 days with no activity.