Closed kangzhang closed 3 years ago
What version of Strimzi do you use? main
doesn't mean anything - it can be build which is several months old as well as build from yesterday. Did you tried it with 0.23.0 or 0.24.0 versions?
BTW: I gave it a try with latest main
build and it seems to work fine for me (using the regex from above as well as the labels).
Also, just to clarify - we are talking about the labels set by Flux on the Kafka custom resource. Not on the pods or stateful resources, right?
Thanks for taking a look! We installed it using this link: https://strimzi.io/install/latest?namespace=kafka which I suppose was the main branch and lastest version.
By labels I meant /spec/template/metadata/labels/kustomize.toolkit.fluxcd.io~1name
on StatefulSet https://gist.github.com/kangzhang/13cec69ce30079b94d4f3bf41631d6a6#file-gistfile1-txt-L246
I suppose that's what triggered the rolling update at https://gist.github.com/kangzhang/13cec69ce30079b94d4f3bf41631d6a6#file-gistfile1-txt-L267 .
Not sure if my reading of the logs is accurate, maybe some other change caused ZookeeperSetOperator to do the updates?
Thanks for taking a look! We installed it using this link: https://strimzi.io/install/latest?namespace=kafka which I suppose was the main branch and lastest version.
Ok, so you are not using main
version but the latest stable release (0.24.0 in this case it seems).
By labels I meant /spec/template/metadata/labels/kustomize.toolkit.fluxcd.io~1name on StatefulSet https://gist.github.com/kangzhang/13cec69ce30079b94d4f3bf41631d6a6#file-gistfile1-txt-L246
I'm not sure I follow this. So where does FluxCD set the labels? On the Kafka
CR? Or on the StatefulSet
/ Pods
? The STRIMZI_LABELS_EXCLUSION_PATTERN
applies only to the Kafka CR. So if FluxCD would set these labels by on other resources, it would not apply to it. But FluxCD should have no reason to label resources it doesn't manage.
Keep also in mind that all labels not matching STRIMZI_LABELS_EXCLUSION_PATTERN
from the Kafka CR are mirrored on all resources. This is good in many use-cases. But this is what causes the rolling updates. When you set the STRIMZI_LABELS_EXCLUSION_PATTERN
env var to exclude the kustomize.toolkit.fluxcd.io
labels, the first think it will do is remove the labels from all resources. So if the cluster was deployed first with these labels and then you exclude them, the first reconciliation will remove them from the resources (via rolling update). However, any subsequent changes to the labels on the Kafka
CR should not trigger anything anymore because the labels are not mirrored anymore. From the logs you shared, it is not clear if what you have there is just the first roll removing the labels or a subsequent roll which changed the label.
The JSON patch in the log:
2021-07-01 18:36:19 DEBUG StatefulSetDiff:103 - Reconciliation #1(watch) Kafka(kafka/kafka): StatefulSet kafka/kafka-zookeeper differs: {"op":"remove","path":"/spec/template/metadata/labels/kustomize.toolkit.fluxcd.io~1name"}
says the op
is remove
. So maybe it was just the first roll which removed the labels and the next changes to the labels on the Kafka
CR will not trigger anything anymore? Can you please try it?
@scholzj OMG you nailed it. It was the first roll which triggered the updates. This totally makes sense now. I'll do a test and report back!
Ok, please give it a try and let me know.
Also, since you are using FluxCD - is there some Flux documentation listing all the different labels they use? It might make sense to add them to the default list to try to avoid this in the first place.
The main labels are a by-product of garbage collection, listed here: https://fluxcd.io/docs/components/kustomize/kustomization/#garbage-collection
The Flux process is on reconciling a change to the source repository it marks any resources with these labels, so that if later the item is removed from the source, it knows which items to GC
hey @schmichri - I just tested it and it worked great! Flux no longer triggers restart. Really appreciated the help and responsiveness!
In case other people run into the same issue. This is the patch we use to get it working:
# exclude flux lables, so that they don't trigger kafka restart
apiVersion: apps/v1
kind: Deployment
metadata:
name: strimzi-cluster-operator
labels:
app: strimzi
namespace: kafka
spec:
template:
spec:
containers:
- name: strimzi-cluster-operator
env:
- name: STRIMZI_LABELS_EXCLUSION_PATTERN
# exclude flux lables to avoid unnecessary restart
value: "(^app.kubernetes.io/(?!part-of).*|^kustomize.toolkit.fluxcd.io.*)
Agreed that it would be great to include the fluxcd regex in default value for STRIMZI_LABELS_EXCLUSION_PATTERN . The link nick linked had more about the tags - it matches the labels I previously saw on config maps:
kustomize.toolkit.fluxcd.io/checksum=77dd8897619d50e2035bafbfe0b41e22cea893cd
kustomize.toolkit.fluxcd.io/name=environment
kustomize.toolkit.fluxcd.io/namespace=flux-system
I'm closing this. Thanks again!
FYI: I opened #5245 to exclude these by default.
Love it!
Describe the bug We started using strimz-kafka-operator recently and it was overall smooth (thank you!).
We ran into some problems with fluxcd (The GitOps tool we use https://fluxcd.io/): strimz-kafka-operator would restart kafka sometimes when we push a change. Looking a bit into it and it seems that some lables added by fluxcd, triggered strimz-kafka-operator to do a rolling update.
Expected behavior strimzi-kafka-operator ignore all fluxcd tags.
Environment (please complete the following information):
YAML files and logs
This is the log from the operator: https://gist.github.com/kangzhang/13cec69ce30079b94d4f3bf41631d6a6 . You can see that the operator detected some label changes by fluxcd: https://gist.github.com/kangzhang/13cec69ce30079b94d4f3bf41631d6a6#file-gistfile1-txt-L244 and triggered a rolling update: https://gist.github.com/kangzhang/13cec69ce30079b94d4f3bf41631d6a6#file-gistfile1-txt-L267
Additional context
I'm wondering if we could black list fluxcd labels? The are only used by fluxcd to track the deployment and should not be used by strimzi.
Here are some example labels fluxcd would add:
I also read thru #2558 and #5083 . The flag
STRIMZI_LABELS_EXCLUSION_PATTERN
does not seem to help. (I already set to"(^app.kubernetes.io/(?!part-of).*|^kustomize.toolkit.fluxcd.io.*)"
and you can see it in log file too: https://gist.github.com/kangzhang/13cec69ce30079b94d4f3bf41631d6a6#file-gistfile1-txt-L69Thanks!