lensesio / fast-data-dev

Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, , 20+ connectors
https://lenses.io
Apache License 2.0
2.02k stars 333 forks source link

zookeeper enters FATAL state after container starts #66

Closed mahi-kandiar closed 6 years ago

mahi-kandiar commented 6 years ago

When i start my docker instance only port 3030 works as zookeeper, schema registry etc fails. This could be because zookeeper enters fatal state. How to get it working

[ec2-user@ip-172-29-3-67 ~]$ sudo docker run --rm --name pmm-confluent-kafka --net=host -e ADV_HOST=172.29.63.67 -e RUNNING_SAMPLEDATA=0 -e SAMPLEDATA=0 landoop/fast-data-dev

Setting advertised host to 172.29.63.67. Starting services. This is Landoop’s fast-data-dev. Kafka 1.0.1-L0 (Landoop's Kafka Distribution). You may visit http://172.29.63.67:3030 in about a minute. 2018-05-07 14:39:30,435 CRIT Supervisor running as root (no user in config file) 2018-05-07 14:39:30,435 INFO Included extra file "/etc/supervisord.d/01-zookeeper.conf" during parsing 2018-05-07 14:39:30,435 INFO Included extra file "/etc/supervisord.d/02-broker.conf" during parsing 2018-05-07 14:39:30,435 INFO Included extra file "/etc/supervisord.d/03-schema-registry.conf" during parsing 2018-05-07 14:39:30,435 INFO Included extra file "/etc/supervisord.d/04-rest-proxy.conf" during parsing 2018-05-07 14:39:30,435 INFO Included extra file "/etc/supervisord.d/05-connect-distributed.conf" during parsing 2018-05-07 14:39:30,435 INFO Included extra file "/etc/supervisord.d/06-caddy.conf" during parsing 2018-05-07 14:39:30,435 INFO Included extra file "/etc/supervisord.d/07-smoke-tests.conf" during parsing 2018-05-07 14:39:30,435 INFO Included extra file "/etc/supervisord.d/08-logs-to-kafka.conf" during parsing 2018-05-07 14:39:30,439 INFO supervisord started with pid 6 2018-05-07 14:39:31,442 INFO spawned: 'zookeeper' with pid 161 2018-05-07 14:39:31,446 INFO spawned: 'caddy' with pid 162 2018-05-07 14:39:31,448 INFO spawned: 'broker' with pid 163 2018-05-07 14:39:31,452 INFO spawned: 'smoke-tests' with pid 164 2018-05-07 14:39:31,455 INFO spawned: 'connect-distributed' with pid 165 2018-05-07 14:39:31,458 INFO spawned: 'logs-to-kafka' with pid 167 2018-05-07 14:39:31,466 INFO spawned: 'schema-registry' with pid 169 2018-05-07 14:39:31,472 INFO spawned: 'rest-proxy' with pid 172 2018-05-07 14:39:32,423 INFO exited: zookeeper (exit status 1; not expected) 2018-05-07 14:39:33,425 INFO spawned: 'zookeeper' with pid 439 2018-05-07 14:39:33,426 INFO success: caddy entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-05-07 14:39:33,426 INFO success: broker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-05-07 14:39:33,426 INFO success: smoke-tests entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-05-07 14:39:33,426 INFO success: connect-distributed entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-05-07 14:39:33,426 INFO success: logs-to-kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-05-07 14:39:33,426 INFO success: schema-registry entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-05-07 14:39:33,426 INFO success: rest-proxy entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-05-07 14:39:33,829 INFO exited: zookeeper (exit status 1; not expected) 2018-05-07 14:39:36,518 INFO spawned: 'zookeeper' with pid 728 2018-05-07 14:39:36,911 INFO exited: zookeeper (exit status 1; not expected) 2018-05-07 14:39:40,532 INFO spawned: 'zookeeper' with pid 1014 2018-05-07 14:39:40,935 INFO exited: zookeeper (exit status 1; not expected) 2018-05-07 14:39:41,520 INFO gave up: zookeeper entered FATAL state, too many start retries too quickly

the logs keep going on and on about spawning and exiting schema-registry, broker, rest proxy, etc

When i visit my-ip:3030 the coyote health checks part keeps spinning

Antwnis commented 6 years ago

Sounds like your EC2 instance does not have enough memory to start-up ZK and Broker and Schema Registry and Kafka Connect ( and so on)

What instance type are u trying this on ?

mahi-kandiar commented 6 years ago

m4.large.

andmarios commented 6 years ago

Hi @mahendran-ayyasamy. At the bottom right of the fast-data-dev UI (http://my-ip:3030) you should find a link to the logs. Can you please check the zookeeper logs?

mahi-kandiar commented 6 years ago

i am now running it in m4.4xlarge instance

Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: ip-172-29-3-17: ip-172-29-3-17: Name does not resolve Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: ip-172-29-3-17: ip-172-29-3-17: Name does not resolve Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: ip-172-29-3-17: ip-172-29-3-17: Name does not resolve Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: ip-172-29-3-17: ip-172-29-3-17: Name does not resolve

andmarios commented 6 years ago

Hmm, maybe you can try to disable the JMX for zookeeper? Just add -e ZK_JMX_PORT=0 to the docker run command.

I've seen this error once, it comes from a specific Linux distribution on AWS (I think ubuntu or maybe debian), where the FQDN that Java autodetects does not resolve from within a docker container.

mahi-kandiar commented 6 years ago

Same error with the new environment variable. I am using ami-aff65ad2 - a community AMI

andmarios commented 6 years ago

Ah, the Amazon AMI. Can you please also try once without the -e ADV_HOST=172.29.63.67? I guess this is the internal IP address of your VM, so it should be picked up automatically since you use --net=host.

mahi-kandiar commented 6 years ago

A variation worked. When i removed ADV_HOST it did not work. So i kept it, but removed --net=host it worked. I guess i have to open the ports one by one now.

andmarios commented 6 years ago

I am not sure whether Schema Registry, Kafka Connect and REST Proxy will work with this setup. The broker and zookeeper should have no issue though.

mahi-kandiar commented 6 years ago

Ok. Without testing anything: Coyote says 100% passed. Was able to telnet to 8081, 8082, 8083,2181, 9581,9582,9583,9584. Will let you know once i run some tests

mahi-kandiar commented 6 years ago

Was able to create Topics etc using 8083

andmarios commented 6 years ago

Glad you've worked it out @mahendran-ayyasamy!