Open edgan opened 6 years ago
Logs:
{"level":"info","ts":1537562399.1103215,"msg":"stopping","type":"coordinator","name":"consumer"}
{"level":"info","ts":1537562399.1103935,"msg":"stopping","type":"module","coordinator":"consumer","class":"kafka","name":"local"}
{"level":"info","ts":1537562399.11174,"msg":"stopping","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1537562399.1142273,"msg":"Recv loop terminated: err=EOF","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1537562399.114262,"msg":"Send loop terminated: err=<nil>","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1537562399.1172094,"msg":"stopping","type":"coordinator","name":"cluster"}
{"level":"info","ts":1537562399.1172383,"msg":"stopping","type":"module","coordinator":"cluster","class":"kafka","name":"local"}
{"level":"info","ts":1537562399.1172647,"msg":"stopping","type":"coordinator","name":"notifier"}
{"level":"info","ts":1537562399.1172776,"msg":"shutdown","type":"coordinator","name":"httpserver"}
{"level":"info","ts":1537562399.1173916,"msg":"stopping","type":"coordinator","name":"evaluator"}
{"level":"info","ts":1537562399.1174622,"msg":"stopping","type":"module","coordinator":"evaluator","class":"caching","name":"default"}
{"level":"info","ts":1537562399.11998,"msg":"stopping","type":"coordinator","name":"storage"}
{"level":"info","ts":1537562399.120022,"msg":"stopping","type":"module","coordinator":"storage","class":"inmemory","name":"default"}
{"level":"info","ts":1537562399.120106,"msg":"stopping","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1537562399.1233048,"msg":"Recv loop terminated: err=EOF","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1537562399.1233966,"msg":"Send loop terminated: err=<nil>","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1537562399.1533234,"msg":"Started Burrow"}
{"level":"info","ts":1537562399.153498,"msg":"configuring","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1537562399.1536365,"msg":"configuring","type":"coordinator","name":"storage"}
{"level":"info","ts":1537562399.1537273,"msg":"configuring","type":"module","coordinator":"storage","class":"inmemory","name":"default"}
{"level":"info","ts":1537562399.1539078,"msg":"configuring","type":"coordinator","name":"evaluator"}
{"level":"info","ts":1537562399.1539931,"msg":"configuring","type":"module","coordinator":"evaluator","class":"caching","name":"default"}
{"level":"info","ts":1537562399.1540968,"msg":"configuring","type":"coordinator","name":"httpserver"}
{"level":"info","ts":1537562399.1546085,"msg":"configuring","type":"coordinator","name":"notifier"}
{"level":"info","ts":1537562399.154642,"msg":"configuring","type":"coordinator","name":"cluster"}
{"level":"info","ts":1537562399.1547153,"msg":"configuring","type":"module","coordinator":"cluster","class":"kafka","name":"local"}
{"level":"info","ts":1537562399.1548803,"msg":"configuring","type":"coordinator","name":"consumer"}
{"level":"info","ts":1537562399.1549742,"msg":"configuring","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1537562399.1552308,"msg":"configuring","type":"module","coordinator":"consumer","class":"kafka","name":"local"}
{"level":"info","ts":1537562399.1554272,"msg":"starting","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1537562399.1559339,"msg":"Connected to 10.2.30.121:2181","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1537562399.1586473,"msg":"Authenticated: id=100470660053855640, timeout=6000","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1537562399.1586776,"msg":"Re-submitting `0` credentials after reconnect","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1537562399.1622052,"msg":"starting","type":"coordinator","name":"storage"}
{"level":"info","ts":1537562399.1622314,"msg":"starting","type":"module","coordinator":"storage","class":"inmemory","name":"default"}
{"level":"info","ts":1537562399.162305,"msg":"starting","type":"coordinator","name":"evaluator"}
{"level":"info","ts":1537562399.1623225,"msg":"starting","type":"module","coordinator":"evaluator","class":"caching","name":"default"}
{"level":"info","ts":1537562399.1623378,"msg":"starting","type":"coordinator","name":"httpserver"}
{"level":"info","ts":1537562399.1625447,"msg":"started listener","type":"coordinator","name":"httpserver","listener":"[::]:8000"}
{"level":"info","ts":1537562399.162597,"msg":"starting","type":"coordinator","name":"notifier"}
{"level":"info","ts":1537562399.162623,"msg":"starting","type":"coordinator","name":"cluster"}
{"level":"info","ts":1537562399.1626327,"msg":"starting","type":"module","coordinator":"cluster","class":"kafka","name":"local"}
{"level":"info","ts":1537562399.1970122,"msg":"starting","type":"coordinator","name":"consumer"}
{"level":"info","ts":1537562399.197053,"msg":"starting","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1537562399.1983898,"msg":"Connected to 10.2.30.114:2181","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1537562399.2010047,"msg":"Authenticated: id=172350794107511546, timeout=30000","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1537562399.201041,"msg":"Re-submitting `0` credentials after reconnect","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1537562399.2022557,"msg":"starting","type":"module","coordinator":"consumer","class":"kafka","name":"local"}
{"level":"info","ts":1537562399.20591,"msg":"starting consumers","type":"module","coordinator":"consumer","class":"kafka","name":"local","topic":"__consumer_offsets","count":50}
{"level":"warn","ts":1537562399.2137485,"msg":"failed to read offset","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk","group":"secor_backup","topic":"connection","partition":0,"error":"zk: node does not exist"}
{"level":"warn","ts":1537562399.2176504,"msg":"failed to read offset","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk","group":"secor_backup","topic":"namehere4","partition":1,"error":"zk: node does not exist"}
{"level":"warn","ts":1537562399.2189863,"msg":"failed to read offset","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk","group":"secor_backup","topic":"namehere4","partition":2,"error":"zk: node does not exist"}
{"level":"warn","ts":1537562399.232973,"msg":"failed to read offset","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk","group":"secor_backup","topic":"namehere4","partition":6,"error":"zk: node does not exist"}
{"level":"info","ts":1537562399.2663305,"msg":"starting evaluations","type":"coordinator","name":"notifier"}
{"level":"info","ts":1537562418.18032,"msg":"cluster or consumer not found","type":"module","coordinator":"evaluator","class":"caching","name":"default","cluster":"local","consumer":"namehere","showall":false}
{"level":"info","ts":1537562438.8329,"msg":"cluster or consumer not found","type":"module","coordinator":"evaluator","class":"caching","name":"default","cluster":"local","consumer":"namehere3","showall":false}
{"level":"info","ts":1537562450.3536875,"msg":"cluster or consumer not found","type":"module","coordinator":"evaluator","class":"caching","name":"default","cluster":"local","consumer":"namehere2","showall":false}
{"level":"info","ts":1537562456.9158769,"msg":"cluster or consumer not found","type":"module","coordinator":"evaluator","class":"caching","name":"default","cluster":"local","consumer":"namehere","showall":false}
I have tried playing with the whitelist/blacklist, with now effect. I have also diffed the staging and prod configuration files, and the only difference is the ip addresses. burrow.toml:
pidfile="/run/burrow/burrow.pid"
stdout-logfile="burrow.out"
client-id="burrow-lagchecker"
[logging]
filename="/var/log/burrow/burrow.log"
level="info"
maxsize=100
maxbackups=30
maxage=10
use-localtime=false
use-compression=true
[zookeeper]
servers=[ "10.2.30.200:2181","10.2.30.114:2181","10.2.30.121:2181" ]
timeout=6
lock-path="/burrow/notifier"
root-path="/burrow"
[client-profile.test]
client-id="burrow-test"
kafka-version="0.11.0"
[cluster.local]
class-name="kafka"
servers=[ "10.2.30.200:9092","10.2.30.114:9092","10.2.30.121:9092" ]
client-profile="test"
topic-refresh=120
offset-refresh=30
[consumer.local]
class-name="kafka"
cluster="local"
servers=[ "10.2.30.200:9092","10.2.30.114:9092","10.2.30.121:9092" ]
client-profile="test"
group-blacklist="^(console-consumer-|python-kafka-consumer-|quick-|ureplicator-).*$"
group-whitelist=""
[consumer.local_zk]
class-name="kafka_zk"
cluster="local"
servers=[ "10.2.30.200:2181","10.2.30.114:2181","10.2.30.121:2181" ]
zookeeper-path="/kafka1"
zookeeper-timeout=30
group-blacklist="^(console-consumer-|python-kafka-consumer-|quick-|ureplicator-).*$"
group-whitelist=""
[httpserver.default]
address=":8000"
[storage.default]
class-name="inmemory"
workers=20
intervals=15
expire-group=604800
min-distance=1
Seeing this exact behavior.
I'm getting the same behavior - Burrow just stopped detecting a few consumers/topics (although they are actively committing offsets). I've even tried reinstalling it entirely and nothing. As well, burrow is not reading any consumer ID/"owner" either
@toddpalino - do you know could be happening?
We have been seeing this behavior as well. I finally had some time to do some research.
We are using Confluent 5.1.2 and I have burrow configured as kafka-version="2.1.0"
. I also have the latest code from Burrow.
In my research, I found that the request for metadata from sarama was not returning data at least some of the time. This was really all of the time I was debugging the code, but it had to work at least some of the time for Burrow to work. This request is used to populate the topics in the in memory storage from what I can tell. I also noticed that sarama has a check on the Kafka version to see which metadata request version to send. See the code here.
I figured I would test to see if the lower version would return data for me, so I set my burrow client profile configuration to kafka-version="0.11.0.2"
. I also set the start-latest configuration to true because I only want to know about active consumers on restart/startup. I don't know if it makes a difference for others, but I did want to mention it as something I changed.
Between those two changes, all of my active consumer groups are reporting properly and are not disappearing from Burrow.
Kafka Version: 0.10.2 I'm getting the below error:
{"level":"warn","ts":1563462481.399035,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: zookeeper is closing"} {"level":"error","ts":1563462497.3998742,"msg":"failed to list groups","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk","error":"zk: node does not exist"}
Kafka 0.12
{"level":"info","ts":1564469520.822827,"msg":"starting","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1564469520.825842,"msg":"Connected to [::1]:2180","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1564469520.827925,"msg":"Authenticated: id=72057722009747468, timeout=30000","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1564469520.827997,"msg":"Re-submitting `0` credentials after reconnect","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"error","ts":1564469520.829778,"msg":"failed to list groups","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk","error":"**zk: node does not exist**"}
I see this behavior when using kafka-console-consumer.sh. But kafka-consumer-perf-test.sh is working.
Any new ideas/solution to this Issue ?
Six years have passed, and the problem still hasn't been solved.
I am running burrow 1.1.0 with kafka 0.11 and zookeeper 3.4.5. I can run a "curl http://127.0.0.1:8000/v3/kafka/local/consumer" right after I start burrow, and see all the groups. But if I run a "curl http://127.0.0.1:8000/v3/kafka/local/consumer/namehere/status" I get back
{"error":false,"message":"consumer status returned","status":{"cluster":"local","group":"namehere","status":"NOTFOUND","complete":1,"partitions":[],"partition_count":0,"maxlag":null,"totallag":0},"request":{"url":"/v3/kafka/local/consumer/namehere/status","host":"hostname-01"}}
Then if I run "curl http://127.0.0.1:8000/v3/kafka/local/consumer" again, I get a list without that one.
Even weirder this works fine in a staging environment, but is failing in prod. Even though both have the same versions kafka, burrow, and zookeeper.