lsst-sqre / strimzi-registry-operator

A Kubernetes Operator for running the Confluent Schema Registry with a Strimzi-based Kafka cluster
MIT License
81 stars 17 forks source link

Get cluster's bootstrap address from Service, not Listener #10

Closed spenczar closed 3 years ago

spenczar commented 3 years ago

So, I tried deploying the operator on a local minikube, and saw it had errors. It was failing because of a KeyError when it tried to look up the bootstrap address of a Kafka cluster. This fixes that by changing how we look up bootstrap addresses.

old behavior

Previously, Strimzi Registry Operator inspected the status.listeners field of the Kafka resource that it's watching. It would iterate over the listeners, looking for one with type: tls. However, at least as far back as I can see in docs, the type of a listener is internal, loadbalancer, nodeport, or ingress (or route, for OpenShift instead of kubernetes).

new behavior

I don't know how this had been working, but I think it's preferable to use the exposed Service that Strimzi creates. The Strimzi docs say they create a service with this name pattern:

cluster-name-kafka-bootstrap

Service can be used as bootstrap servers for Kafka clients connecting from within the Kubernetes cluster.

A service gets a DNS entry in kubernetes - it will be available at <service-name>.<namespace>.svc. When a pod resolves that DNS, it'll get the right IP address automagically.

So, just by knowing the cluster name and namespace, we already have a valid hostname for reaching the broker's bootstrap listener. The only other thing we need is a port, so I wrote some code to query for the Service and look for an internal TCP port that it's listening on.

I think this is a bit more robust. The only thing I'm uncertain of is whether the port will always be named tcp-internal, but I think it will.

jonathansick commented 3 years ago

This was targetting strimizi 0.14, so I imagine it's changed. Using the service sounds like a better interface, so let's go for it.

It looks like there's a flake8 error to resolve?

spenczar commented 3 years ago

Yeah, I think we're seeing https://github.com/tholo/pytest-flake8/issues/81. We need to pin more versions for CI, I guess. I'll do that in this PR.

spenczar commented 3 years ago

After a bit of helpful discussion with @scholzj over in the CNCF slack, I now have by doubts on this general approach, since there's no guarantee that tcp-internal is a port name that will exist on the service.

Instead, @scholzj suggested that the user could provide the listener name that we should use (probably inside the spec for the StrimziSchemaRegistry custom resource). We can query for the listener by name from the Kafka custom resource status - basically just like the existing version of this repo's code, but with a little customization to pull out the right listener.

I'm going to make a new PR to try that out. I'll close this for now.