Open JohnOmernik opened 8 years ago
So one option I am exploring is since I am running mesos dns, there are some automatic records created that have the information we are looking for. Say my mesos-kafka framework was called "kafkaprod" and I started 5 brokers, I could for example get the IP by looking up the A record "broker-2.kafkaprod.mesos" (assuming my mesos-dns name is .mesos). To get the port I could look up the SRV record _broker-2._tcp.kafkaprod.mesos Now.. the challenge is getting that information at kafka-connect startup and then rewriting or sed the properties for kafka connect every time it starts. Perhaps instead of starting it with it's shell script, use a different shell script that does it's magic. That should allow things to work... thoughts?
Hey @JohnOmernik it sounds like good resolution would be an update to the scheduler. The scheduler could accept the connection for the meta data request and facilitate it. The scheduler would be your DNS record. You could make a script now and do that through the API and parse the JSON response if you wanted and format it into the structure connect wanted. Not very clean and repeatable but is a good right now solution until the scheduler could do it.
How about something like an simple tcp socket forward for bootstrap (like http://www.nakov.com/books/inetjava/source-code-html/Chapter-1-Sockets/1.4-TCP-Sockets/TCPForwardServer.java.html) The Kafka client just connects for the initialization and disconnect after getting the Metadata. In my test this worked fine.
Talking with Ewen at strata, I learned that the kafka community is moving away from using zookeeper to find kafka brokers. This is smart from a security perspective. That said, they want to producers/consumers to connect to a list of brokers to get the initial cluster information. This is good move for kafka, but adds some challenges to kafka mesos running kafka 9. Basically, how do we code something to a dns hostname or loadbalancer that represents the kafka hosts and ports running on the hosts? I'd love to be able to have kafka-mesos be able to listen on a service port (so if running in marathon, it would open the API port, but also a "service" port for the cluster which would basically proxy to one of the brokers on the broker port, get the state information and away they go. Per Ewen, after the initial connection through a load balancer (or through kafka-mesos) all other communication is direct to broker, so that won't be an issue, we just need one sane place to put into something like kafka connect as a bootstrap server to get that initial cluster information. Would love other feedback here and thoughts from folks on potential work arounds.