strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.8k stars 1.28k forks source link

watchAnyNamespace flag leads to crash on operator startup. #5476

Closed mjrepo2 closed 3 years ago

mjrepo2 commented 3 years ago

Describe the bug watchAnyNamespace flag leads to crash on operator startup

To Reproduce Steps to reproduce the behavior: Use helm chart for strimzi version 0.25.0, set watchAnyNamespace: true, install strimzi. No other changes have been made to the default helm chart

Expected behavior The operator to start, and then watch all namespaces for strimzi resources

Environment (please complete the following information):

YAML files and logs

2021-08-25 09:55:56 INFO  ClusterOperator:78 - Creating ClusterOperator for namespace *
2021-08-25 09:55:56 INFO  ClusterOperator:94 - Starting ClusterOperator for namespace *
2021-08-25 09:55:56 WARN  WatcherWebSocketListener:73 - Exec Failure: HTTP 404, Status: 404 - 404 page not found

2021-08-25 09:55:56 WARN  WatcherWebSocketListener:73 - Exec Failure: HTTP 404, Status: 404 - 404 page not found

2021-08-25 09:55:56 WARN  WatcherWebSocketListener:73 - Exec Failure: HTTP 404, Status: 404 - 404 page not found

2021-08-25 09:55:56 WARN  WatcherWebSocketListener:73 - Exec Failure: HTTP 404, Status: 404 - 404 page not found

2021-08-25 09:55:56 WARN  WatcherWebSocketListener:73 - Exec Failure: HTTP 404, Status: 404 - 404 page not found

2021-08-25 09:55:56 WARN  WatcherWebSocketListener:73 - Exec Failure: HTTP 404, Status: 404 - 404 page not found

2021-08-25 09:55:56 WARN  WatcherWebSocketListener:73 - Exec Failure: HTTP 404, Status: 404 - 404 page not found

2021-08-25 09:55:56 ERROR Main:153 - Cluster Operator verticle in namespace * failed to start
io.fabric8.kubernetes.client.KubernetesClientException: 404 page not found

Hopefully I am missing something simple here, if any more information is needed please advise.

scholzj commented 3 years ago

HTTP 404 normally means that the CRDs are not installed. Can you double check that they are properly installed and in the right version (they need to have the v1beta2 version)?

mjrepo2 commented 3 years ago

Thanks for the quick reply - hopefully I am providing the right information:

kafkabridges.kafka.strimzi.io            2021-08-10T13:48:45Z
kafkaconnectors.kafka.strimzi.io         2021-08-10T13:48:45Z
kafkaconnects.kafka.strimzi.io           2021-08-10T13:48:44Z
kafkaconnects2is.kafka.strimzi.io        2021-08-10T13:48:44Z
kafkamirrormaker2s.kafka.strimzi.io      2021-08-10T13:48:45Z
kafkamirrormakers.kafka.strimzi.io       2021-08-10T13:48:45Z
kafkarebalances.kafka.strimzi.io         2021-08-10T13:48:45Z
kafkas.kafka.strimzi.io                  2021-08-10T13:48:44Z
kafkatopics.kafka.strimzi.io             2021-08-10T13:48:45Z
kafkausers.kafka.strimzi.io              2021-08-10T13:48:45Z

Labels:       app=strimzi
              component=kafkatopics.kafka.strimzi.io-crd
              strimzi.io/crd-install=true
Annotations:  <none>
API Version:  apiextensions.k8s.io/v1
Kind:         CustomResourceDefinition
Metadata:
  Creation Timestamp:  2021-08-10T13:48:45Z
  Generation:          1
  Managed Fields:
    API Version:  apiextensions.k8s.io/v1beta1

These are the crds that I have, but it seems the apiVersion is on a few v1, and others v1beta1

Like I said I have used the example strimzi helm chart and only changed the watchAnyNamespace flag. Did I make a mistake somewhere or is this abnormal behaviour?

scholzj commented 3 years ago

Can you run something like this and share the output?

kubectl get crd kafkas.kafka.strimzi.io -o jsonpath='{.spec.versions[].name}'
mjrepo2 commented 3 years ago

Yes, and sorry for wrongly labeling the post!

kubectl get crd kafkas.kafka.strimzi.io -o jsonpath='{.spec.versions[].name}' v1beta1

So it seems the helm chart is installing the wrong api version of the CRDs? Have I made a mistake in the chart or is this a bug? Here are the values I supplied:


# Default values for strimzi-kafka-operator.

# If you set `watchNamespaces` to the same value as ``.Release.Namespace` (e.g. `helm ... --namespace $NAMESPACE`),
# the chart will fail because duplicate RoleBindings will be attempted to be created in the same namespace
watchNamespaces: []
watchAnyNamespace: true

image:
  registry: quay.io
  repository: strimzi
  name: operator
  tag: 0.25.0
logVolume: co-config-volume
logConfigMap: strimzi-cluster-operator
logLevel: ${env:STRIMZI_LOG_LEVEL:-INFO}
fullReconciliationIntervalMs: 120000
operationTimeoutMs: 300000
kubernetesServiceDnsDomain: cluster.local
featureGates: ""

tolerations: []
affinity: {}
annotations: {}
labels: {}
nodeSelector: {}

podSecurityContext: {}
securityContext: {}

# Docker images that operator uses to provision various components of Strimzi.  To use your own registry prefix the
# repository name with your registry URL.
# Ex) repository: registry.xyzcorp.com/strimzi/zookeeper
zookeeper:
  image:
    registry: quay.io
    repository: strimzi
    name: kafka
    tagPrefix: 0.25.0
kafka:
  image:
    registry: quay.io
    repository: strimzi
    name: kafka
    tagPrefix: 0.25.0
kafkaConnect:
  image:
    registry: quay.io
    repository: strimzi
    name: kafka
    tagPrefix: 0.25.0
topicOperator:
  image:
    registry: quay.io
    repository: strimzi
    name: operator
    tag: 0.25.0
userOperator:
  image:
    registry: quay.io
    repository: strimzi
    name: operator
    tag: 0.25.0
kafkaInit:
  image:
    registry: quay.io
    repository: strimzi
    name: operator
    tag: 0.25.0
tlsSidecarEntityOperator:
  image:
    registry: quay.io
    repository: strimzi
    name: kafka
    tagPrefix: 0.25.0
kafkaMirrorMaker:
  image:
    registry: quay.io
    repository: strimzi
    name: kafka
    tagPrefix: 0.25.0
kafkaBridge:
  image:
    registry: quay.io
    repository: strimzi
    name: kafka-bridge
    tag: 0.20.2
kafkaExporter:
  image:
    registry: quay.io
    repository: strimzi
    name: kafka
    tagPrefix: 0.25.0
jmxTrans:
  image:
    registry: quay.io
    repository: strimzi
    name: jmxtrans
    tag: 0.25.0
kafkaMirrorMaker2:
  image:
    registry: quay.io
    repository: strimzi
    name: kafka
    tagPrefix: 0.25.0
cruiseControl:
  image:
    registry: quay.io
    repository: strimzi
    name: kafka
    tagPrefix: 0.25.0
tlsSidecarCruiseControl:
  image:
    registry: quay.io
    repository: strimzi
    name: kafka
    tagPrefix: 0.25.0
kanikoExecutor:
  image:
    registry: quay.io
    repository: strimzi
    name: kaniko-executor
    tag: 0.25.0
resources:
  limits:
    memory: 384Mi
    cpu: 1000m
  requests:
    memory: 384Mi
    cpu: 200m
livenessProbe:
  initialDelaySeconds: 10
  periodSeconds: 30
readinessProbe:
  initialDelaySeconds: 10
  periodSeconds: 30

# Override the docker registry used by all Strimzi images
imageRegistryOverride: ""
# Override the docker image repository used by all Strimzi images
imageRepositoryOverride: ""
# Override the docker image tag used by all Strimzi images
imageTagOverride: 0.25.0
createGlobalResources: true
# Override the exclude pattern for exclude some labels
labelsExclusionPattern: ""
# Controls whether Strimzi generates network policy resources (By default true)
generateNetworkPolicy: true
# Override the value for Connect build timeout
connectBuildTimeoutMs: 300000
scholzj commented 3 years ago

Right ... so for 0.25, you need to have the v1beta2 version. So that explains the HTTP 404 error in the operator.

I don't really use/know Helm, so not sure why it did what it did. Was it upgrade or first install? It looks to me like you had some old Strimzi CRDs already installed on your cluster when you installed 0.25. If you are sure that no other strimzi installation is running in your cluster, the easiest way should be to delete the Strimzi CRDs and install those bundled with the 0.25.0 Helm Chart (you can also find the YAMLs here - just use the 04X-*.yaml files).

PS: Don't worry about the labels ... ti is not always obvious what is bug and what not.

mjrepo2 commented 3 years ago

It was a fresh install, I helm deleted strimzi and removed the namespace, and reinstalled - I did however have strimzi running in the cluster already before. I have just removed everything again and it seems the CRDs are still there, so this may indeed be the issue. I will remove the CRDs manually and reinstall fresh and report back my findings.

Thank you for your time.

mjrepo2 commented 3 years ago

Removing the old CRDs has seemed to solve the issue, again thanks for your time and help! I was expecting Helm to remove any resources associated with a deployment when I delete it, this wasn't the case.

scholzj commented 3 years ago

Yeah, I know some tools do not delete the CRDs when uninstalling the operator. I know OperatorHub.io does this, Helm might as well. So that might have caused this.

pavankumar-go commented 3 years ago

@scholzj I installed just the topic-operator and set STRIMZI_NAMESPACE = "*" to watch all the namespaces along with clusterrole & clusterrolebinding. But i expected the topics to get created if KafkaTopic manifest is applied in any namespace. But i'm see 404 in the topic operator

io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://172.20.0.1/apis/kafka.strimzi.io/v1beta2/namespaces/*/kafkatopics. Message: namespaces "*" not found. Received status: Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, kind=namespaces, name=*, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=namespaces "*" not found, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=NotFound, status=Failure, additionalProperties={}).

its running latest version 0.25.0 and

kubectl get crd kafkatopics.kafka.strimzi.io -o jsonpath='{.spec.versions[].name}'                                                                                                            ⏎ ✹ ✭
v1beta2

even tried adding required namespaces to topic-operator deployment

         - name: STRIMZI_NAMESPACE
              value: "qa,dev,integration,test-topic-operator"
scholzj commented 3 years ago

@pavankumar-go The Topic Operator does not support watching all namespaces. Kafka is not namespaced, so it is hard to translate the topics between Kube namespaces and Kafka, so this is not supported (at least not yet). This issue was about the Cluster Operator which supports this.

pavankumar-go commented 3 years ago

@scholzj Thank you for the quick reply. Understood