strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.8k stars 1.28k forks source link

Cruise Control UnknownHostException #4886

Closed imalik8088 closed 3 years ago

imalik8088 commented 3 years ago

Describe the bug Adding the KafkaRebalance (cruise control) Resource I do get an UnknownHostException in the State. Although Topics could be created and also metrics were build If I run kubectl describe kafkarebalance my-cluster I get an UnknownHostException in the state.

To Reproduce Steps to reproduce the behavior:

  1. Add cruiseControl: {} to an existing cluster
  2. Run command 'kubectl describe kafkarebalance xxxx'
  3. See error

Expected behavior Show me the proposal by cruise control. for e.g.

Status:
  Conditions:
    Last Transition Time:  2021-04-30T11:44:49.074773Z
    Status:                True
    Type:                  ProposalReady
  Observed Generation:     1
  Optimization Result:
    Data To Move MB:  108
    Excluded Brokers For Leadership:

Environment:

YAML files and logs

Additional context Solved by deleting the kubectl delete kafkarebalance xxx and reapplying the yaml file

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaRebalance
metadata:
  labels:
    argocd.argoproj.io/instance: xxxxx
    strimzi.io/cluster: xxxxxxxx
  name: xxxxx
  namespace: xxxxxx
spec: {}

Thats what I'm getting from describe kafkarebalance before reapplying.

status:
  conditions:
  - lastTransitionTime: "2021-05-03T09:51:00.590323Z"
    message: 'failed to resolve ''xxxxxxx.svc''
      after 2 queries '
    reason: UnknownHostException
    status: "True"
    type: NotReady
  observedGeneration: 1
scholzj commented 3 years ago

That looks like some DNS issue. Have you tried to restart the Cruise Contol pod?

imalik8088 commented 3 years ago

Wow lightning fast feedback :) thx
I've deleted the pod so that k8s is recreating the pod, but this didn't helped.

scholzj commented 3 years ago

Hmm, that is weird. The error basically says that it cannot resolve the service in DNS. I guess you masked out the service name, so it is hard to say if it is the right one. But either something is deleting the service and that is why it does not resolve or this is some kind of DNS issue. Maybe you can see if the service resolves from other pods (you can for example use the Kuard tool to check if your DNS works.)