linkedin / Burrow

Kafka Consumer Lag Checking
Apache License 2.0
3.75k stars 798 forks source link

kafka: client has run out of available brokers to talk to (Is your cluster reachable? #380

Open akamalov opened 6 years ago

akamalov commented 6 years ago

Environment:

OS: RHEL 7.3
Docker 1.13.1-cs4
Burrow: latest
Kafka: Confluent-3.3.0
Authentication: SASL_PLAINTEXT

Problem:

Container-built Burrow cannot connect to Kafka nodes

{"level":"error","ts":1522848692.2069006,"msg":"failed to start client","type":"module","coordinator":"cluster","class":"kafka","name":"dp_cnj_kafka_prod","error":"kafka: client has run out of available brokers to talk to (Is your cluster reachable?)"}

Steps to reproduce:

Clone repository

git clone https://github.com/linkedin/Burrow

Reconfigure configuration file: burrow.toml

[general]
pidfile="burrow.pid"
stdout-logfile="burrow.out"
access-control-allow-origin="*"

[logging]
filename="/var/log/burrow.log"
level="debug"
maxsize=100
maxbackups=30
maxage=10
use-localtime=false
use-compression=true

[zookeeper]
servers=["server-cnj-prod-z1:2181","server-cnj-prod-z2:2181","server-cnj-prod-z3:2181"]
timeout=6
zookeeper-offsets=true

[cluster.dp_cnj_kafka_prod]
client-profile="dp_cnj_kafka_prod"
class-name="kafka"
servers=["server-cnj-prod-k1:9092","server-cnj-prod-k2:9092","server-cnj-prod-k3:9092"]
topic-refresh=120
offset-refresh=30
offset-topic="__consumer_offsets"

[storage.default]
class-name="inmemory"
workers=20
intervals=15
expire-group=604800
min-distance=1

[client-profile.dp_cnj_kafka_prod]
kafka-version="0.11.0"
client-id="burrow-client"
sasl="sasl_dp_cnj_kafka"

[sasl.sasl_dp_cnj_kafka]
username="admin"
password="XXXXXXXXXXXXX"
handshake-first=false

[consumer.consumer_dp_cnj_kafka]
class-name="kafka"
cluster="dp_cnj_kafka_prod"
client-profile="dp_cnj_kafka_prod"
servers=["server-cnj-prod-k1:9092","server-cnj-prod-k2:9092","server-cnj-prod-k3:9092"]
start_latest=false
offset_topic="__consumer_offsets"
group-whitelist=".*"
group-blacklist="^(console-consumer-|python-kafka-consumer-).*$"

[consumer.consumer_dp_cnj_zk]
class-name="kafka_zk"
cluster="dp_cnj_kafka_prod"
servers=["server-cnj-prod-z1:2181","server-cnj-prod-z2:2181","server-cnj-prod-z3:2181"]
zookeeper-timeout=30
group-blacklist="^(console-consumer-|python-kafka-consumer-).*$"

[httpserver.default]
address=":8000"

################################################################

Build container:

docker build -t akamalov/burrow:1.0 .

Deploy container with a volume map where configuration directory is mapped to a container, as well as log directory:

docker run -d  -v /opt/burrow:/etc/burrow -v /opt/burrow/logs:/var/log/  -p 8000:8000 akamalov/burrow:1.0

################################################################

burrow.log output:

{"level":"info","ts":1522848690.73516,"msg":"Started Burrow"}
{"level":"info","ts":1522848690.7353127,"msg":"configuring","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1522848690.7359643,"msg":"configuring","type":"coordinator","name":"storage"}
{"level":"info","ts":1522848690.7359943,"msg":"configuring","type":"module","coordinator":"storage","class":"inmemory","name":"default"}
{"level":"info","ts":1522848690.7360544,"msg":"configuring","type":"coordinator","name":"evaluator"}
{"level":"info","ts":1522848690.7360692,"msg":"configuring","type":"module","coordinator":"evaluator","class":"caching","name":"default"}
{"level":"info","ts":1522848690.7360852,"msg":"configuring","type":"coordinator","name":"httpserver"}
{"level":"info","ts":1522848690.7361557,"msg":"configuring","type":"coordinator","name":"notifier"}
{"level":"info","ts":1522848690.7361748,"msg":"configuring","type":"coordinator","name":"cluster"}
{"level":"info","ts":1522848690.7361956,"msg":"configuring","type":"module","coordinator":"cluster","class":"kafka","name":"dp_cnj_kafka_prod"}
{"level":"info","ts":1522848690.7367117,"msg":"configuring","type":"coordinator","name":"consumer"}
{"level":"info","ts":1522848690.7367504,"msg":"configuring","type":"module","coordinator":"consumer","class":"kafka","name":"consumer_dp_cnj_kafka"}
{"level":"info","ts":1522848690.7372715,"msg":"configuring","type":"module","coordinator":"consumer","class":"kafka_zk","name":"consumer_dp_cnj_zk"}
{"level":"info","ts":1522848690.7377377,"msg":"starting","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1522848690.7678626,"msg":"Connected to 192.168.2.27:2181","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1522848690.7947993,"msg":"Authenticated: id=171768459403460873, timeout=6000","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1522848690.794854,"msg":"Re-submitting `0` credentials after reconnect","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1522848690.8221726,"msg":"starting","type":"coordinator","name":"storage"}
{"level":"info","ts":1522848690.8222377,"msg":"starting","type":"module","coordinator":"storage","class":"inmemory","name":"default"}
{"level":"info","ts":1522848690.822296,"msg":"starting","type":"coordinator","name":"evaluator"}
{"level":"info","ts":1522848690.8223126,"msg":"starting","type":"module","coordinator":"evaluator","class":"caching","name":"default"}
{"level":"info","ts":1522848690.8223212,"msg":"starting","type":"coordinator","name":"httpserver"}
{"level":"info","ts":1522848690.8226066,"msg":"started listener","type":"coordinator","name":"httpserver","listener":"[::]:8000"}
{"level":"info","ts":1522848690.8226492,"msg":"starting","type":"coordinator","name":"notifier"}
{"level":"info","ts":1522848690.8226643,"msg":"starting","type":"coordinator","name":"cluster"}
{"level":"info","ts":1522848690.8227122,"msg":"starting","type":"module","coordinator":"cluster","class":"kafka","name":"dp_cnj_kafka_prod"}
{"level":"info","ts":1522848690.9740508,"msg":"starting evaluations","type":"coordinator","name":"notifier"}
{"level":"error","ts":1522848692.2069006,"msg":"failed to start client","type":"module","coordinator":"cluster","class":"kafka","name":"dp_cnj_kafka_prod","error":"kafka: client has run out of available brokers to talk to (Is your cluster reachable?)"}
{"level":"info","ts":1522848692.206969,"msg":"stopping","type":"coordinator","name":"notifier"}
{"level":"info","ts":1522848692.2069814,"msg":"shutdown","type":"coordinator","name":"httpserver"}
{"level":"info","ts":1522848692.2070007,"msg":"stopping","type":"coordinator","name":"evaluator"}
{"level":"info","ts":1522848692.207008,"msg":"stopping","type":"module","coordinator":"evaluator","class":"caching","name":"default"}
{"level":"info","ts":1522848692.207018,"msg":"stopping","type":"coordinator","name":"storage"}
{"level":"info","ts":1522848692.207027,"msg":"stopping","type":"module","coordinator":"storage","class":"inmemory","name":"default"}
{"level":"info","ts":1522848692.2071266,"msg":"stopping","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1522848692.2337039,"msg":"Recv loop terminated: err=EOF","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1522848692.2337453,"msg":"Send loop terminated: err=<nil>","type":"coordinator","name":"zookeeper"}

It can't seem to be able to reach kafka nodes.

$ telnet server-cnj-prod-k1 9092
Trying 192.168.2.25...
Connected to server-cnj-prod-k1.
Escape character is '^]'.
^C
Connection closed by foreign host.

Any ideas ? Your help is very much appreciated. Thank you.

################################################################

akamalov commented 6 years ago

It looks like there is a bug with Burrow when Kafka is configured with SASL_PLAINTEXT: https://github.com/linkedin/Burrow/issues/333#issuecomment-378793491

akamalov commented 6 years ago

Is this project dead ?

tommyJimmy87 commented 6 years ago

+1

toddpalino commented 6 years ago

@akamalov, why do you have handshake-first explicitly set to false in your config? Unless you are using a proxy (which is not currently supported in Burrow), this will break your SASL connection. For now, explicitly set it to "true" (as there's a problem with the default config right now, per #333).

akamalov commented 6 years ago

I tested both ways, with 'false' and then set to 'true'. Current setting:

handshake-first=true
echo-xu commented 6 years ago

Just FYI, setting handshake-first=true fixed the problem for me.

thefunkjunky commented 5 years ago

setting handshake-first=true does NOT work for me

burrow.toml:

[general]
pidfile="/var/lock/burrow/burrow.pid"
stdout-logfile="burrow.out"

[zookeeper]
servers=[
  "my.zookeeper:2181"
]
timeout=6
root-path="/burrow"

[client-profile.myclient]
kafka-version="2.0.0"
client-id="burrow-myclient"
tls="mytlsprofile"
sasl="mysaslprofile"

[tls.mytlsprofile]
certfile="whatever.crt"
keyfile="whatever.key"
cafile="whatever/ca-bundle.crt"
noverify=false

[sasl.mysaslprofile]
username="burrow"
password="****"
handshake-first=true

[httpserver.tlslistener]
address=":8443"
timeout=300
tls="mytlsprofile"

[cluster.myclustername]
class-name="kafka"
servers=[
  "kafkabroker:9093"
]
client-profile="myclient"
topic-refresh=120
offset-refresh=30

[consumer.myconsumers]
class-name="kafka"
cluster="myclustername"
servers=[
  "kafkabroker:9093"
]
client-profile="myclient"
offsets-topic="__consumer_offsets"
start-latest=true
group-whitelist=".*"

burrow-stdout.log:

{"level":"info","ts":1553733907.134768,"msg":"Started Burrow"}
{"level":"info","ts":1553733907.1348245,"msg":"configuring","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553733907.1350582,"msg":"configuring","type":"coordinator","name":"storage"}
{"level":"info","ts":1553733907.1350753,"msg":"configuring","type":"module","coordinator":"storage","class":"inmemory","name":"default"}
{"level":"info","ts":1553733907.1351273,"msg":"configuring","type":"coordinator","name":"evaluator"}
{"level":"info","ts":1553733907.135139,"msg":"configuring","type":"module","coordinator":"evaluator","class":"caching","name":"default"}
{"level":"info","ts":1553733907.1351595,"msg":"configuring","type":"coordinator","name":"httpserver"}
{"level":"info","ts":1553733907.143542,"msg":"configuring","type":"coordinator","name":"notifier"}
{"level":"info","ts":1553733907.143559,"msg":"configuring","type":"coordinator","name":"cluster"}
{"level":"info","ts":1553733907.143576,"msg":"configuring","type":"module","coordinator":"cluster","class":"kafka","name":"myclustername"}
{"level":"info","ts":1553733907.1542895,"msg":"configuring","type":"coordinator","name":"consumer"}
{"level":"info","ts":1553733907.1543238,"msg":"configuring","type":"module","coordinator":"consumer","class":"kafka","name":"myconsumers"}
{"level":"info","ts":1553733907.165999,"msg":"starting","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553733907.1682487,"msg":"Connected to 10.172.58.162:2181","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553733907.1699815,"msg":"Authenticated: id=101824693704917831, timeout=6000","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553733907.1699991,"msg":"Re-submitting `0` credentials after reconnect","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553733907.1723924,"msg":"starting","type":"coordinator","name":"storage"}
{"level":"info","ts":1553733907.1724021,"msg":"starting","type":"module","coordinator":"storage","class":"inmemory","name":"default"}
{"level":"info","ts":1553733907.172434,"msg":"starting","type":"coordinator","name":"evaluator"}
{"level":"info","ts":1553733907.1724522,"msg":"starting","type":"module","coordinator":"evaluator","class":"caching","name":"default"}
{"level":"info","ts":1553733907.1724596,"msg":"starting","type":"coordinator","name":"httpserver"}
{"level":"info","ts":1553733907.1725779,"msg":"started listener","type":"coordinator","name":"httpserver","listener":"[::]:8080"}
{"level":"info","ts":1553733907.1725914,"msg":"starting","type":"coordinator","name":"notifier"}
{"level":"info","ts":1553733907.1726022,"msg":"starting","type":"coordinator","name":"cluster"}
{"level":"info","ts":1553733907.1726058,"msg":"starting","type":"module","coordinator":"cluster","class":"kafka","name":"myclustername"}
{"level":"info","ts":1553733907.2764928,"msg":"starting evaluations","type":"coordinator","name":"notifier"}
{"level":"error","ts":1553733908.0209036,"msg":"failed to start client","type":"module","coordinator":"cluster","class":"kafka","name":"myclustername","error":"kafka: client has run out of available brokers to talk to (Is your cluster reachable?)"}
{"level":"info","ts":1553733908.0209348,"msg":"stopping","type":"coordinator","name":"notifier"}
{"level":"info","ts":1553733908.0209424,"msg":"shutdown","type":"coordinator","name":"httpserver"}
{"level":"info","ts":1553733908.020983,"msg":"stopping","type":"coordinator","name":"evaluator"}
{"level":"info","ts":1553733908.020989,"msg":"stopping","type":"module","coordinator":"evaluator","class":"caching","name":"default"}
{"level":"info","ts":1553733908.0209956,"msg":"stopping","type":"coordinator","name":"storage"}
{"level":"info","ts":1553733908.0210028,"msg":"stopping","type":"module","coordinator":"storage","class":"inmemory","name":"default"}
{"level":"info","ts":1553733908.021035,"msg":"stopping","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553733908.0232832,"msg":"Recv loop terminated: err=EOF","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553733908.023301,"msg":"Send loop terminated: err=<nil>","type":"coordinator","name":"zookeeper"}

Using confluent kafka 2.0.0 with SASL-SSL authentication on port 9093.

I checked the following to see if I could bridge a connection to the host/port

Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 34.210.208.13:9093.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.

which worked fine. I also compared the sasl user/password on both the broker and the burrow config, and they checked out.