[Bug] Operator Rolling Update is triggered when resource is not modified

mhaddon commented 4 years ago

Describe the bug It seems that if the operator thinks that the Kafka resource has been changed it triggers a rolling update of the resource. As hinted in the documentation: "In most cases, the Cluster Operator only updates your Kafka or ZooKeeper clusters in response to changes to the corresponding Kafka resource. This enables you to plan when to apply changes to a Kafka resource to minimize the impact on Kafka client applications.".

The problem I am having is that even if the resource does not change, the rolling update is triggered.

If I am using kustomize and have my "kind: Kafka" in the same kustomize "build" as another resource. If the other resource is updated, then the Strimizi Operator thinks it needs to perform a rolling update of Kafka, even if Kafka is not modified. Looking at the yaml output of kustomize, I cannot even see a difference in the yaml.

This is a big problem because I am using GitOps to provision my environment and therefore I am getting a rolling update about every 5-10 minutes...

To Reproduce Steps to reproduce the behavior:

Put Kafka in the same Kustomize build as another resource
Edit the other resource
kustomize build . | kubectl apply -f -
Kafka is forced to rolling update

Expected behavior Rolling update only happens if I specifically edit Kafka... or at least give me the ability to turn-off the rolling update functionality.

Environment (please complete the following information):

Strimzi version: 0.16.2
Installation method: yaml
Kubernetes cluster: 1.14
Infrastructure: EKS

YAML files and logs The logs just keep saying:

2020-02-16 22:06:02 INFO  PodOperator:65 - Rolling update of test/kafka-zookeeper: Rolling pod kafka-zookeeper-0
2020-02-16 22:06:48 INFO  PodOperator:65 - Rolling update of test/kafka-kafka: Rolling pod kafka-kafka-0

My Kafka deployment:

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: kafka
spec:
  kafka:
    version: 2.4.0
    replicas: 1
    listeners:
      plain: {}
      tls: {}
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      log.message.format.version: "2.4"
      delete.topic.enable: "true"
    storage:
      type: jbod
      volumes:
        - id: 0
          type: persistent-claim
          size: 10Gi
          deleteClaim: false
  zookeeper:
    replicas: 1
    storage:
      type: persistent-claim
      size: 10Gi
      deleteClaim: false
  entityOperator:
    topicOperator: {}
    userOperator: {}

scholzj commented 4 years ago

Can you please share the whole log from the Cluster Operator? Ideally with DEBUG log level switched on? (by setting the environment variable STRIMZI_LOG_LEVEL to DEBUG in the Operator deployment)

mhaddon commented 4 years ago

Here is 6 minutes worth of logs, it already rolling updated a lot, so hopefully there is enough info

operatorlogs.txt

mhaddon commented 4 years ago

It also seems that if I wait a while, and just from the terminal just try re-applying the Kafka yaml file:

kustomize build . | kubectl apply -f -
kafka.kafka.strimzi.io/kafka configured
kafkatopic.kafka.strimzi.io/ingress-ship-event configured
kafkatopic.kafka.strimzi.io/test configured

It gets listed as being configured, which is probably what triggers the operator, even though nothing has actually changed

You know, now I am playing around with it, I think this is the real problem, and not editing an unrelated resource...

The Kustomize currently looks like this:

kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: team-test

bases: 
 - ./kafka/

commonLabels:
  env: prod
  domain: team-test

./kafka/kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - kafka.yaml

./kafka/kafka.yaml

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: kafka
spec:
  kafka:
    version: 2.4.0
    replicas: 1
    listeners:
      plain: {}
      tls: {}
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      log.message.format.version: "2.4"
      delete.topic.enable: "true"
    storage:
      type: jbod
      volumes:
        - id: 0
          type: persistent-claim
          size: 10Gi
          deleteClaim: false
  zookeeper:
    replicas: 1
    storage:
      type: persistent-claim
      size: 10Gi
      deleteClaim: false
  entityOperator:
    topicOperator: {}
    userOperator: {}

I am going to see if it is even related to kustomize though and if i can re-produce it with just kubectl apply... I am wondering if kustomize is re-ordering the labels randomly which causes the problem.

scholzj commented 4 years ago

I'm not sure it is kustomize what is changing the resource. You seem to be adding some labels to it:

2020-02-16 23:01:32 DEBUG StatefulSetDiff:97 - StatefulSet team-test/kafka-zookeeper differs: {"op":"remove","path":"/metadata/labels/fluxcd.io~1sync-gc-mark"}
2020-02-16 23:01:32 DEBUG StatefulSetDiff:98 - Current StatefulSet path /metadata/labels/fluxcd.io~1sync-gc-mark has value 
2020-02-16 23:01:32 DEBUG StatefulSetDiff:99 - Desired StatefulSet path /metadata/labels/fluxcd.io~1sync-gc-mark has value 
2020-02-16 23:01:32 DEBUG StatefulSetDiff:81 - StatefulSet team-test/kafka-zookeeper ignoring diff {"op":"remove","path":"/spec/revisionHistoryLimit"}
2020-02-16 23:01:32 DEBUG StatefulSetDiff:81 - StatefulSet team-test/kafka-zookeeper ignoring diff {"op":"remove","path":"/spec/template/metadata/annotations/strimzi.io~1generation"}
2020-02-16 23:01:32 DEBUG StatefulSetDiff:97 - StatefulSet team-test/kafka-zookeeper differs: {"op":"remove","path":"/spec/template/metadata/labels/fluxcd.io~1sync-gc-mark"}
2020-02-16 23:01:32 DEBUG StatefulSetDiff:98 - Current StatefulSet path /spec/template/metadata/labels/fluxcd.io~1sync-gc-mark has value 
2020-02-16 23:01:32 DEBUG StatefulSetDiff:99 - Desired StatefulSet path /spec/template/metadata/labels/fluxcd.io~1sync-gc-mark has value

That is one of the things which is triggering the restart. Additionally it seems to list ancillary CM change as changed. Not sure what exactly does that mean in your case, because I'm not sure what code you actually use since you say you are using master branch, but this message is not in the latest master branch.

scholzj commented 4 years ago

As for the configured, I have no idea what your kustomize build is changing. You will need to compare the resource before and after to see what exactly changed.

mhaddon commented 4 years ago

I am using 0.16.2 of the operator.

I think I know what the problem is, now I need to figure out what to do about it.

I did this:

kubectl get kafka -o yaml -n team-test > a && kustomize build env/prod/ > b && kustomize build env/prod/ | kubectl apply -f -

Then I compared the output of the a and b file, and I see something very obvious.

If you use FluxCD, the GitOps CD tool by Weaveworks, then it add its own annotations and labels to the deployment.

annotations:
  fluxcd.io/sync-checksum: cee6605a3f351637b5a60ba4e28c87cb9e5e78d9
labels:
  fluxcd.io/sync-gc-mark: sha256.lFncy9Bn7eb87iLUySM2398Zx2GPXE0752HbwgTMRxE

This would therefore also update the: kubectl.kubernetes.io/last-applied-configuration annotation

scholzj commented 4 years ago

So the wierd thing is why is it changing the annotation / label when the resource it self does not change. I would not expect that to happen. We can probably blacklist any labels and annotations starting with fluxcd.io/sync. But TBH I'm not entirely sure what do they mean and what do they do. So I'm not sure whether that could have any side-effects.

mhaddon commented 4 years ago

https://docs.fluxcd.io/en/1.18.0/references/garbagecollection.html https://github.com/fluxcd/flux/blob/master/pkg/cluster/kubernetes/sync.go#L41

It looks like they are used so FluxCD knows which resource was made from which git commit and the last commit flux synced with.

They do not seem to be intended to impact the actual running of the resource. So I do not think there is a downside to blacklist them. Remember though, the kubectl.kubernetes.io/last-applied-configuration annotation would also change.

So I did some testing, and FluxCD does not change the annotations/labels on every commit, and it notices they are unchanged. The issue was that I was sometimes provisioning the resource without FluxCD, which would then purge the FluxCD annotations/labels. Strimizi is the only thing I have noticed this with so far, but I think it is my fault for using FluxCD incorrectly. It seems that if you use FluxCD to provision things, you really only want to provision that stuff with FluxCD, or otherwise it loses the ability to control the resource.

Thanks for the help, it might be worth ignoring those annotations/labels as you suggested, but probably I should just use the tools correctly.

scholzj commented 4 years ago

So do you think you need these labels / annos blacklisted? Or will you be able to handle it differently? I do not think I will blacklist them if you have another solution.

mhaddon commented 4 years ago

Personally I will just make FluxCD poll git more often, and I will let FluxCD make the changes for me. That is good enough for me.

It might be something to consider though because as GitOps CD pipelines become more popular, and they mutate the definitions of the resources they create, it might be more people have issues like this in the future.

I have closed the ticket though because I do not feel any immediate work is necessary.

vl4deee11 commented 3 years ago

@mhaddon you already solved this problem ?

strimzi / strimzi-kafka-operator

[Bug] Operator Rolling Update is triggered when resource is not modified #2558