linkedin / Burrow

Kafka Consumer Lag Checking
Apache License 2.0
3.76k stars 801 forks source link

Burrow process crashed #511

Open senthil13 opened 5 years ago

senthil13 commented 5 years ago

Burrow is crashing after certain time, without showing any information other than some fatal memory error.

Below is the config file. (removed server names from the file)

[general] pidfile="burrow.pid" stdout-logfile="burrow.out" access-control-allow-origin="*"

[logging] filename="logs/burrow.log" level="info" maxsize=100 maxbackups=30 maxage=10 use-localtime=false use-compression=true

[zookeeper] servers=[ "" ] timeout=6 zookeeper-offsets=true

[cluster.dev_cluster] client-profile="dev_cluster" class-name="kafka" servers=[ "" ] topic-refresh=120 offset-refresh=30 offset-topic="__consumer_offsets"

[storage.default] class-name="inmemory" workers=20 intervals=15 expire-group=604800 min-distance=1

[client-profile.dev_cluster] client-id="burrow-client" kafka-version="0.11.0"

[consumer.consumer_dev_cluster_kafka] class-name="kafka" cluster="dev_cluster" servers=[ "" ] client-profile="dev_cluster" group-blacklist="^(console-consumer-|python-kafka-consumer-|quick-).*$" group-whitelist=""

[consumer.consumer_dev_cluster_zk] class-name="kafka_zk" cluster="dev_cluster" servers=[ "" ] zookeeper-timeout=30 group-blacklist="^(console-consumer-|python-kafka-consumer-).*$"

[httpserver.default] address=":8050"

Logs are attached burrow.txt

kanapuli commented 5 years ago

Can you give more details on your machine configurations? Is it always crashing or is the crash random?

richardwalsh commented 5 years ago

We've noticed this as well but haven't quite narrowed it down. We have burrow configured to run behind an ELB using the burrow/admin endpoint as the health check. Not sure if that is the ideal endpoint to hit but we use it as a basic sanity check that "its running". We have burrow setup with an autoscaling group configured to ensure 1 instance is always running and the scaling history shows it failing the health check once a day or once every several days. The logs for the instance don't show anything interesting so there are no useful clues to go off. Are there any flags, logs, or anything to enable to attempt to get more clues?