wurstmeister / kafka-docker

Dockerfile for Apache Kafka
http://wurstmeister.github.io/kafka-docker/
Apache License 2.0
6.94k stars 2.73k forks source link

Why is there a zombie process in the container? #497

Open sjt157 opened 5 years ago

sjt157 commented 5 years ago

1558757495(1) 1558757634(1) 1558757682(1)

platform: Ubuntu 16.04

docker-compose.yml

version: '2.1'

services:
  kafka1:
    image: wurstmeister/kafka:2.12-2.0.1
    restart: always
    hostname: kafka4
    container_name: kafka4
    ports:
    - 9097:9092
    environment:

      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka4:9092
      KAFKA_LISTENERS: PLAINTEXT://kafka4:9092
      KAFKA_BROKER_ID: 18
      KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
       #6 partition and 3 replicas
      KAFKA_CREATE_TOPICS: "wave2018021:6:3,wave2018031:6:3,wave2018041:6:3"
    volumes:
    - /home/ubuntu16/Docker/data/kafka1:/kafka
    - /home/ubuntu16/Docker/logs/kafka1:/opt/kafka/logs
    external_links:
    - zoo1
    - zoo2
    - zoo3
    networks:
      mybridge:
        ipv4_address: 172.18.20.230

  kafka2:
    image: wurstmeister/kafka:2.12-2.0.1
    restart: always
    hostname: kafka5
    container_name: kafka5
    ports:
    - 9098:9092
    environment:
     # KAFKA_ADVERTISED_HOST_NAME: kafka2
     # KAFKA_ADVERTISED_PORT: 9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka5:9092
      KAFKA_LISTENERS: PLAINTEXT://kafka5:9092
      KAFKA_BROKER_ID: 19
      KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
      KAFKA_CREATE_TOPICS: "taxi1:6:3"
    volumes:
    - /home/ubuntu16/Docker/data/kafka2:/kafka
    - /home/ubuntu16/Docker/logs/kafka2:/opt/kafka/logs
    external_links:
    - zoo1
    - zoo2
    - zoo3
    networks:
      mybridge:
        ipv4_address: 172.18.20.231

  kafka3:
    image: wurstmeister/kafka:2.12-2.0.1
    restart: always
    hostname: kafka6
    container_name: kafka6
    ports:
    - 9099:9092

    environment:
     # KAFKA_ADVERTISED_HOST_NAME: kafka3
     # KAFKA_ADVERTISED_PORT: 9094
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka6:9092
      KAFKA_LISTENERS: PLAINTEXT://kafka6:9092
      KAFKA_BROKER_ID: 20
      KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
      KAFKA_CREATE_TOPICS: "camera1:6:3"
    volumes:
    - /home/ubuntu16/Docker/data/kafka1:/kafka
    - /home/ubuntu16/Docker/logs/kafka1:/opt/kafka/logs
    external_links:
    - zoo1
    - zoo2
    - zoo3
    networks:
      mybridge:
        ipv4_address: 172.18.20.232

networks:
  mybridge:
    external:
      name: mybridge
sscaling commented 5 years ago

the create topics script runs in the background and is initiated by PID 1. Perhaps using disown may allow this to be reaped after it has been completed - however, as the start_kafka script runs as PID 1 I'm not sure if it will work. It will need a little investigation to test this.

sjt157 commented 5 years ago

Do you mean add disown in the start_kafka script? and Where is disown added?After create-topics.sh &?? I am not very familiar with Shell.

sscaling commented 5 years ago

It would be create-topics.sh & disown - but as the script runs as PID 1, I don't think the kernel will reap the process as it's PID 1's responsibility. We'd probably need to introduce a lightweight init system such as dumb-init to handle this scenario - https://github.com/Yelp/dumb-init#why-you-need-an-init-system

sjt157 commented 5 years ago

I see. What do you think of this solution?-https://github.com/phusion/baseimage-docker/blob/rel-0.9.16/image/bin/my_init .Which is more suitable to handle this scenario ?

sscaling commented 5 years ago

I think for most, it's probably not a huge issue - so unless it's causing problems's (such as filing up the last slot in the process table - in which case you probably have bigger issues) then there's nothing to do. The Phusion solution requires Python - which seems like a lot of extra baggage to pull in (100MBs vs < 1Mb)

theBNT commented 3 years ago

Hey, we are seeing this issue where eventually no new processes can be spawned on the host because of zombie processes with the same parent. The deployment is a single broker, zookeeper and AKHQ one, started via docker-compose on a SLES system.

Any hints on how to debug/improve this further?

Process is started by this container: kafka-docker_kafka "start-kafka.sh" 29 hours ago Up 29 hours 0.0.0.0:9095->9095/tcp kafka-docker_kafka_1

so everytime a new topic is created (e.g. via AKHQ), a new defunct process hangs in the system (where 20653 is the kafka process)

root 32753 20653 0 07:20 ? 00:00:00 [timeout] <defunct>