kmg28801 / kafka-study

1 stars 0 forks source link

[실전 카프카 개발부터 운영까지] 7장. 카프카 운영과 모니터링 #17

Open kmg28801 opened 1 year ago

kmg28801 commented 1 year ago

안정적인 운영을 위한 주키퍼와 카프카 구성

주키퍼 구성

주키퍼 서버 수량

주키퍼 하드웨어

카프카 구성

카프카 서버 수량

카프카 하드웨어

모니터링 시스템 구성

애플리케이션으로부터 카프카의 로그 관리와 분석

참고 log4j 로그 레벨

TRACE : DEBUG보다 상세한 로그 기록 DEBUG : 내부 애플리케이션 상황에 대한 로그 기록 INFO : 로그 레벨의 기본값, 일반적인 정보 수준의 로그 기록 -> 카프카 애플리케이션의 기본값 WARN : 경고 수준의 로그 기록 ERROR : 런타임 에러나 예상하지 못한 에러 로그 기록 FATAL : 심각한 오류로 인한 애플리케이션 중지 등의 로그 기록

https://aws.amazon.com/amazon-linux-2/ 28 package(s) needed for security, out of 30 available Run "sudo yum update" to apply all updates. [ec2-user@ip-172-31-4-136 ~]$ cat /usr/local/kafka/config/log4j.properties

Change the two lines below to adjust the general broker logging level (output to server.log and stdout)

log4j.logger.kafka=INFO log4j.logger.org.apache.kafka=INFO


- 로그 레벨 변경
```cmd
# Change the two lines below to adjust the general broker logging level (output to server.log and stdout)
log4j.logger.kafka=DEBUG
log4j.logger.org.apache.kafka=DEBUG
[ec2-user@ip-172-31-4-136 ~]$ sudo vi /usr/local/kafka/config/log4j.properties
[ec2-user@ip-172-31-4-136 ~]$ sudo systemctl restart kafka-server # 카프카 서버 재시작
[ec2-user@ip-172-31-4-136 ~]$ cat /usr/local/kafka/logs/server.log # 서버 로그 확인
[2023-07-30 16:19:13,929] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2023-07-30 16:19:14,747] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2023-07-30 16:19:14,837] INFO Registered signal handlers for TERM, INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler)
[2023-07-30 16:19:14,843] INFO starting (kafka.server.KafkaServer)
[2023-07-30 16:19:14,845] INFO Connecting to zookeeper on peter-zk01.foo.bar:2181,peter-zk02.foo.bar:2181,peter-zk03.foo.bar:2181 (kafka.server.KafkaServer)
[2023-07-30 16:19:14,846] DEBUG Checking login config for Zookeeper JAAS context [java.security.auth.login.config=null, zookeeper.sasl.client=default:true, zookeeper.sasl.clientconfig=default:Client] (org.apache.kafka.common.security.JaasUtils) # DEBUG 레벨 로그 발견
[2023-07-30 16:19:14,883] INFO [ZooKeeperClient Kafka server] Initializing a new session to peter-zk01.foo.bar:2181,peter-zk02.foo.bar:2181,peter-zk03.foo.bar:2181. (kafka.zookeeper.ZooKeeperClient)
[2023-07-30 16:19:14,899] INFO Client environment:zookeeper.version=3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:53 GMT (org.apache.zookeeper.ZooKeeper)
[2023-07-30 16:19:14,900] INFO Client environment:host.name=peter-kafka01.foo.bar (org.apache.zookeeper.ZooKeeper)
[2023-07-30 16:19:14,901] INFO Client environment:java.version=1.8.0_372 (org.apache.zookeeper.ZooKeeper)

카프카 애플리케이션의 로그 파일 종류와 역할

server.log = 브로커 설정 정보와 정보성 로그 등을 기록, 브로커를 재시작하는 경우 브로커의 옵션 정보 기록 state-change.log = 컨트롤러부터 받은 정보를 기록 kafka-request.log = 클라이언트로부터 받은 정보를 기록 log-cleaner.log = 로그 컴패션 동작등을 기록 controller.log = 컨트롤러 관련 정보를 기록 kafka-authorizer.log = 인증과 관련된 정보를 기록

JMX를 이용한 카프카 메트릭 모니터링

카프카 JMX 설정 방법

  1. JMX 포트 확인
[ec2-user@ip-172-31-4-136 ~]$ cat /usr/local/kafka/config/jmx # JMX 포트 확인
JMX_PORT=9999
[ec2-user@ip-172-31-4-136 ~]$ netstat -ntl | grep 9999 # JMX 포트 활성화 여부 확인
tcp6       0      0 :::9999                 :::*                    LISTEN
  1. 프로메테우스 설치
 ✘ user  ~/Desktop/kafka-aws  ssh -i keypair.pem ec2-user@15.165.76.254 # ansible public IP
The authenticity of host '15.165.76.254 (15.165.76.254)' can't be established.
ED25519 key fingerprint is SHA256:bbx9U3sAWi+oHCd89qi0hXWy8dWhSH7XKlYDQXKTu2M.
This host key is known by the following other names/addresses:
    ~/.ssh/known_hosts:124: 13.125.218.113
    ~/.ssh/known_hosts:145: 3.34.144.59
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '15.165.76.254' (ED25519) to the list of known hosts.
Last login: Tue Jul 11 14:56:28 2023 from 110.9.16.150

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
23 package(s) needed for security, out of 25 available
Run "sudo yum update" to apply all updates.
[ec2-user@ip-172-31-2-254 ~]$
[ec2-user@ip-172-31-2-254 ~]$ sudo amazon-linux-extras install -y docker
Installing docker
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
Cleaning repos: amzn2-core amzn2extra-ansible2 amzn2extra-docker amzn2extra-kernel-5.10
22 metadata files removed
8 sqlite files removed
0 metadata files removed
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
amzn2-core                                                                                                                                                                                                                                               | 3.7 kB  00:00:00
amzn2extra-ansible2                                                                                                                                                                                                                                      | 3.0 kB  00:00:00
amzn2extra-docker                                                                                                                                                                                                                                        | 3.0 kB  00:00:00
amzn2extra-kernel-5.10                                                                                                                                                                                                                                   | 3.0 kB  00:00:00
(1/9): amzn2-core/2/x86_64/group_gz                                                                                                                                                                                                                      | 2.5 kB  00:00:00
(2/9): amzn2-core/2/x86_64/updateinfo                                                                                                                                                                                                                    | 661 kB  00:00:00
(3/9): amzn2extra-docker/2/x86_64/primary_db                                                                                                                                                                                                             | 106 kB  00:00:00
(4/9): amzn2extra-ansible2/2/x86_64/updateinfo                                                                                                                                                                                                           |   76 B  00:00:00
(5/9): amzn2extra-kernel-5.10/2/x86_64/updateinfo                                                                                                                                                                                                        |  34 kB  00:00:00
(6/9): amzn2extra-docker/2/x86_64/updateinfo                                                                                                                                                                                                             | 9.8 kB  00:00:00
(7/9): amzn2extra-ansible2/2/x86_64/primary_db                                                                                                                                                                                                           |  40 kB  00:00:00
(8/9): amzn2extra-kernel-5.10/2/x86_64/primary_db                                                                                                                                                                                                        |  21 MB  00:00:00
(9/9): amzn2-core/2/x86_64/primary_db

https://aws.amazon.com/amazon-linux-2/ 23 package(s) needed for security, out of 25 available Run "sudo yum update" to apply all updates. [ec2-user@ip-172-31-2-254 ~]$ sudo systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Active: active (running) since 일 2023-07-30 07:33:09 UTC; 25s ago Docs: https://docs.docker.com Process: 3206 ExecStartPre=/usr/libexec/docker/docker-setup-runtimes.sh (code=exited, status=0/SUCCESS) Process: 3197 ExecStartPre=/bin/mkdir -p /run/docker (code=exited, status=0/SUCCESS) Main PID: 3208 (dockerd) Tasks: 8 Memory: 87.0M CGroup: /system.slice/docker.service └─3208 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --default-ulimit nofile=32768:65536

7월 30 07:33:08 ip-172-31-2-254.ap-northeast-2.compute.internal dockerd[3208]: time="2023-07-30T07:33:08.827846593Z" level=info msg="[graphdriver] using prior storage driver: overlay2" 7월 30 07:33:08 ip-172-31-2-254.ap-northeast-2.compute.internal dockerd[3208]: time="2023-07-30T07:33:08.834220415Z" level=warning msg="Your kernel does not support cgroup blkio weight" 7월 30 07:33:08 ip-172-31-2-254.ap-northeast-2.compute.internal dockerd[3208]: time="2023-07-30T07:33:08.834246199Z" level=warning msg="Your kernel does not support cgroup blkio weight_device" 7월 30 07:33:08 ip-172-31-2-254.ap-northeast-2.compute.internal dockerd[3208]: time="2023-07-30T07:33:08.835008593Z" level=info msg="Loading containers: start." 7월 30 07:33:09 ip-172-31-2-254.ap-northeast-2.compute.internal dockerd[3208]: time="2023-07-30T07:33:09.327216413Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address" 7월 30 07:33:09 ip-172-31-2-254.ap-northeast-2.compute.internal dockerd[3208]: time="2023-07-30T07:33:09.393052150Z" level=info msg="Loading containers: done." 7월 30 07:33:09 ip-172-31-2-254.ap-northeast-2.compute.internal dockerd[3208]: time="2023-07-30T07:33:09.540714522Z" level=info msg="Docker daemon" commit=6051f14 graphdriver(s)=overlay2 version=20.10.23 7월 30 07:33:09 ip-172-31-2-254.ap-northeast-2.compute.internal dockerd[3208]: time="2023-07-30T07:33:09.541363078Z" level=info msg="Daemon has completed initialization" 7월 30 07:33:09 ip-172-31-2-254.ap-northeast-2.compute.internal systemd[1]: Started Docker Application Container Engine. 7월 30 07:33:09 ip-172-31-2-254.ap-northeast-2.compute.internal dockerd[3208]: time="2023-07-30T07:33:09.570043107Z" level=info msg="API listen on /run/docker.sock" [ec2-user@ip-172-31-2-254 ~]$ sudo mkdir -p /etc/prometheus [ec2-user@ip-172-31-2-254 ~]$


- prometheus 설치
```cmd
[ec2-user@ip-172-31-2-254 ~]$ sudo mkdir -p /etc/prometheus
[ec2-user@ip-172-31-2-254 ~]$ git clone https://github.com/onlybooks/kafka2 # 이미 우리는 clone 받았음
fatal: 대상 경로가('kafka2') 이미 있고 빈 디렉터리가 아닙니다.
[ec2-user@ip-172-31-2-254 ~]$ ls
kafka2  keypair.pem
[ec2-user@ip-172-31-2-254 ~]$ sudo cp kafka2/chapter7/prometheus.yml /etc/prometheus/
[ec2-user@ip-172-31-2-254 ~]$ sudo docker run -d --network host -p 9090:9090 -v /etc/prometheus/promethues.yml:/etc/prometheus/prometheus.yml --name prometheus prom/prometheus # 알파벳 삑사리 남 ㅠ
Unable to find image 'prom/prometheus:latest' locally
latest: Pulling from prom/prometheus
d5c4df21b127: Pull complete
2f5f7d8898a1: Pull complete
300c29bb5b04: Pull complete
be6ad5a51a35: Pull complete
ea6cf9f81dfe: Pull complete
b5ac85a4be54: Pull complete
d32980b63d51: Pull complete
502ed6d3bdc8: Pull complete
7bed70210741: Pull complete
3b19398e1689: Pull complete
d358eb0a0392: Pull complete
d6eaeaf54563: Pull complete
Digest: sha256:d6ead9daf2355b9923479e24d7e93f246253ee6a5eb18a61b0f607219f341a80
Status: Downloaded newer image for prom/prometheus:latest
WARNING: Published ports are discarded when using host network mode
be515d4e3016cd0833c07c65bf9a4448043d065999716fe42300a14712b1de75
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/etc/prometheus/promethues.yml" to rootfs at "/etc/prometheus/prometheus.yml": mount /etc/prometheus/promethues.yml:/etc/prometheus/prometheus.yml (via /proc/self/fd/6), flags: 0x5000: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.
[ec2-user@ip-172-31-2-254 ~]$ sudo docker run -d --network host -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml --name prometheus prom/prometheus
docker: Error response from daemon: Conflict. The container name "/prometheus" is already in use by container "be515d4e3016cd0833c07c65bf9a4448043d065999716fe42300a14712b1de75". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.
[ec2-user@ip-172-31-2-254 ~]$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
[ec2-user@ip-172-31-2-254 ~]$ docker ps -al
CONTAINER ID   IMAGE             COMMAND                  CREATED          STATUS    PORTS     NAMES
be515d4e3016   prom/prometheus   "/bin/prometheus --c…"   57 seconds ago   Created             prometheus
[ec2-user@ip-172-31-2-254 ~]$ docker rm be515d4e3016 # 생긴 컨테이너 삭제
be515d4e3016
[ec2-user@ip-172-31-2-254 ~]$ sudo docker run -d --network host -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml --name prometheus prom/prometheus # 다시 제대로 만들기
WARNING: Published ports are discarded when using host network mode
bfc54bd2e77174067f1a0d55cbb9cc735d08a76dab989e4357b3f35f3c855092
[ec2-user@ip-172-31-2-254 ~]$ sudo docker ps
CONTAINER ID   IMAGE             COMMAND                  CREATED          STATUS          PORTS     NAMES
bfc54bd2e771   prom/Prometheus   "/bin/prometheus --c…"   12 seconds ago   Up 12 seconds             prometheus

그라파나 설치

[ec2-user@ip-172-31-2-254 ~]$ sudo docker run -d --network host -p 3000:3000 --name grafana grafana/grafana:7.3.7
Unable to find image 'grafana/grafana:7.3.7' locally
7.3.7: Pulling from grafana/grafana
801bfaa63ef2: Pull complete
efdb3434c59e: Pull complete
8cbdb3f56d34: Pull complete
34f82d4bd2ec: Pull complete
af445b3382af: Pull complete
4f4fb700ef54: Pull complete
8aab09bbec8e: Pull complete
9e81c23e3db5: Pull complete
Digest: sha256:5f19b6c385e8bfb8e5c9ecc7cdd123a453af3cf01e7c20d20059e770f656286d
Status: Downloaded newer image for grafana/grafana:7.3.7
WARNING: Published ports are discarded when using host network mode
86e96a2ab124588e213fd72420dab5eb23cc45a63488150f6070b4f5a0b82e54

[ec2-user@ip-172-31-2-254 ~]$ sudo docker ps
CONTAINER ID   IMAGE                   COMMAND                  CREATED          STATUS          PORTS     NAMES
86e96a2ab124   grafana/grafana:7.3.7   "/run.sh"                14 seconds ago   Up 11 seconds             grafana
bfc54bd2e771   prom/prometheus         "/bin/prometheus --c…"   4 minutes ago    Up 4 minutes              prometheus

익스포터 설치

https://aws.amazon.com/amazon-linux-2/ 28 package(s) needed for security, out of 30 available Run "sudo yum update" to apply all updates. [ec2-user@ip-172-31-4-136 ~]$ sudo mkdir -p /usr/local/jmx [ec2-user@ip-172-31-4-136 ~]$ sudo yum -y install git Loaded plugins: extras_suggestions, langpacks, priorities, update-motd amzn2-core | 3.7 kB 00:00:00 Package git-2.40.1-1.amzn2.0.1.x86_64 already installed and latest version Nothing to do [ec2-user@ip-172-31-4-136 ~]$ git clone https://github.com/onlybooks/kafka2 'kafka2'에 복제합니다... remote: Enumerating objects: 302, done. remote: Counting objects: 100% (40/40), done. remote: Compressing objects: 100% (17/17), done. remote: Total 302 (delta 26), reused 25 (delta 23), pack-reused 262 오브젝트를 받는 중: 100% (302/302), 27.58 MiB | 13.90 MiB/s, 완료. 델타를 알아내는 중: 100% (74/74), 완료.


- 익스포터 실행하기 위한 파일 복사 및 기타 삽질
```cmd
[ec2-user@ip-172-31-4-136 ~]$ vi jmx_prometheus_httpserver.yml
[ec2-user@ip-172-31-4-136 ~]$ cd usr/local/jmx
-bash: cd: usr/local/jmx: No such file or directory
[ec2-user@ip-172-31-4-136 ~]$ ls
kafka2
[ec2-user@ip-172-31-4-136 ~]$ ls
kafka2
[ec2-user@ip-172-31-4-136 ~]$ cd ./.
[ec2-user@ip-172-31-4-136 ~]$ cd ..
[ec2-user@ip-172-31-4-136 home]$ ls
ec2-user
[ec2-user@ip-172-31-4-136 home]$ cd ..
[ec2-user@ip-172-31-4-136 /]$ ls
bin  boot  data  dev  etc  home  lib  lib64  local  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
[ec2-user@ip-172-31-4-136 /]$ cd usr
[ec2-user@ip-172-31-4-136 usr]$ cd local
[ec2-user@ip-172-31-4-136 local]$ cd jmx
[ec2-user@ip-172-31-4-136 jmx]$ ls
jmx_prometheus_httpserver-0.13.1-SNAPSHOT-jar-with-dependencies.jar  jmx_prometheus_httpserver.yml
[ec2-user@ip-172-31-4-136 jmx]$ cat jmx_promethues_httpserver.yml
cat: jmx_promethues_httpserver.yml: No such file or directory
[ec2-user@ip-172-31-4-136 jmx]$ cat jmx_prometheu_httpserver.yml
cat: jmx_prometheu_httpserver.yml: No such file or directory
[ec2-user@ip-172-31-4-136 jmx]$ cat jmx_prometheus_httpserver.yml
hostPort: 127.0.0.1:9999 # JMX가 실행되고 있는 IP와 포트 정보
ssl: false # SSL 사용여부
rules: # 순서대로 적용할 규칙 리스트
  - pattern: ".*"

[Service] Type=simple Restart=always ExecStart=/usr/bin/java -jar /usr/local/jmx/jmx_prometheus_httpserver-0.13.1-SNAPSHOT-jar-with-dependencies.jar 7071 /usr/local/jmx/jmx_prometheus_httpserver.yml # JMX 익스포터 실행하는 명령어 부분, 현재는 JMX 익스포터가 7071 포트 사용하도록 되어있지만, 다른 포트 번호로 변경 원하면 여기 변경하면됨 (책대로하자)

[Install] WantedBy=multi-user.target ~


- 디렉토리 조심 및 리눅스 시스템에서 `systemd`의 변경이 생긴 후에는 반드시 `systemctl daemon-reload` 명령어 수행 필요

```cmd
[ec2-user@ip-172-31-4-136 kafka2]$ cd chapter7
[ec2-user@ip-172-31-4-136 chapter7]$ ls
7_commands.txt  jmx-exporter.service  jmx_prometheus_httpserver-0.13.1-SNAPSHOT-jar-with-dependencies.jar  jmx_prometheus_httpserver.yml  kafka_metrics.json  node-exporter.service  prometheus.yml  예제
[ec2-user@ip-172-31-4-136 chapter7]$ vi jmx-exporter.service
[ec2-user@ip-172-31-4-136 chapter7]$ sudo cp kafka2/chapter7/jmx-exporter.service /etc/systemd/system
cp: cannot stat `kafka2/chapter7/jmx-exporter.service': No such file or directory
[ec2-user@ip-172-31-4-136 chapter7]$ sudo cp jmx-exporter.service /etc/systemd/system
[ec2-user@ip-172-31-4-136 chapter7]$ sudo systemctl daemon-reload
[ec2-user@ip-172-31-4-136 chapter7]$ sudo systemctl start jmx-exporter
[ec2-user@ip-172-31-4-136 chapter7]$ sudo systemctl status jmx-exporter
● jmx-exporter.service - JMX Exporter for Kafka
   Loaded: loaded (/etc/systemd/system/jmx-exporter.service; disabled; vendor preset: disabled)
   Active: active (running) since 일 2023-07-30 16:53:39 KST; 6s ago
 Main PID: 24287 (java)
   CGroup: /system.slice/jmx-exporter.service
           └─24287 /usr/bin/java -jar /usr/local/jmx/jmx_prometheus_httpserver-0.13.1-SNAPSHOT-jar-with-dependencies.jar 7071 /usr/local/jmx/jmx_prometheus_httpserver.yml

 7월 30 16:53:39 ip-172-31-4-136.ap-northeast-2.compute.internal systemd[1]: Started JMX Exporter for Kafka.

[ec2-user@ip-172-31-4-136 chapter7]$ curl http://localhost:7071/metrics
# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
# TYPE jmx_config_reload_success_total counter
jmx_config_reload_success_total 0.0
# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
# TYPE jmx_config_reload_failure_total counter
jmx_config_reload_failure_total 0.0
# HELP kafka_server_socket_server_metrics_connection_creation_total The total number of new connections established (kafka.server<type=socket-server-metrics, listener=PLAINTEXT, networkProcessor=1><>connection-creation-total)
# TYPE kafka_server_socket_server_metrics_connection_creation_total untyped
kafka_server_socket_server_metrics_connection_creation_total{listener="PLAINTEXT",networkProcessor="1",} 0.0
kafka_server_socket_server_metrics_connection_creation_total{listener="PLAINTEXT",networkProcessor="3",} 0.0
kafka_server_socket_server_metrics_connection_creation_total{listener="PLAINTEXT",networkProcessor="2",} 0.0
kafka_server_socket_server_metrics_connection_creation_total{listener="PLAINTEXT",networkProcessor="0",} 1.0
# HELP kafka_controller_ControllerChannelManager_MeanRate Attribute exposed for management (kafka.controller<type=ControllerChannelManager, name=RequestRateAndQueueTimeMs, broker-id=1><>MeanRate)
# TYPE kafka_controller_ControllerChannelManager_MeanRate untyped
kafka_controller_ControllerChannelManager_MeanRate{name="RequestRateAndQueueTimeMs",broker_id="1",} 0.0033385209862075963
# HELP kafka_controller_ControllerStats_99thPercentile Attribute exposed for management (kafka.controller<type=ControllerStats, name=ListPartitionReassignmentRateAndTimeMs><>99thPercentile)
# TYPE kafka_controller_ControllerStats_99thPercentile untyped
kafka_controller_ControllerStats_99thPercentile{name="ListPartitionReassignmentRateAndTimeMs",} 0.0
kafka_controller_ControllerStats_99thPercentile{name="ControlledShutdownRateAndTimeMs",} 0.0
kafka_controller_ControllerStats_99thPercentile{name="PartitionReassignmentRateAndTimeMs",} 0.0
kafka_controller_ControllerStats_99thPercentile{name="ManualLeaderBalanceRateAndTimeMs",} 3.143332
kafka_controller_ControllerStats_99thPercentile{name="UncleanLeaderElectionEnableRateAndTimeMs",} 0.0
kafka_controller_ControllerStats_99thPercentile{name="TopicChangeRateAndTimeMs",} 0.0
kafka_controller_ControllerStats_99thPercentile{name="ControllerChangeRateAndTimeMs",} 879.436079
kafka_controller_ControllerStats_99thPercentile{name="IsrChangeRateAndTimeMs",} 0.0
kafka_controller_ControllerStats_99thPercentile{name="TopicDeletionRateAndTimeMs",} 0.0
kafka_controller_ControllerStats_99thPercentile{name="LogDirChangeRateAndTimeMs",} 0.0
kafka_controller_ControllerStats_99thPercentile{name="TopicUncleanLeaderElectionEnableRateAndTimeMs",} 0.0
kafka_controller_ControllerStats_99thPercentile{name="LeaderElectionRateAndTimeMs",} 0.0
kafka_controller_ControllerStats_99thPercentile{name="ControllerShutdownRateAndTimeMs",} 0.0
kafka_controller_ControllerStats_99thPercentile{name="LeaderAndIsrResponseReceivedRateAndTimeMs",} 2.242129
kafka_controller_ControllerStats_99thPercentile{name="AutoLeaderBalanceRateAndTimeMs",} 15.382767
# HELP kafka_server_socket_server_metrics_io_wait_time_ns_avg The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds. (kafka.server<type=socket-server-metrics, listener=PLAINTEXT, networkProcessor=1><>io-wait-time-ns-avg)
# TYPE kafka_server_socket_server_metrics_io_wait_time_ns_avg untyped
kafka_server_socket_server_metrics_io_wait_time_ns_avg{listener="PLAINTEXT",networkProcessor="1",} 3.003696505642458E8

위 명령어들 kafka01~03 서버에서 반복

[ec2-user@ip-172-31-4-136 ~]$ wget https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-386.tar.gz
--2023-07-30 17:21:24--  https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-386.tar.gz
Resolving github.com (github.com)... 20.200.245.247
Connecting to github.com (github.com)|20.200.245.247|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/9524057/2ae54580-afed-11ea-9edf-c8e21bb074f9?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230730%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230730T082124Z&X-Amz-Expires=300&X-Amz-Signature=8366a9db94e2d8b94fb6ebf6d935a9cb19d3ae4b7e3171a2564de05292c22443&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=9524057&response-content-disposition=attachment%3B%20filename%3Dnode_exporter-1.0.1.linux-386.tar.gz&response-content-type=application%2Foctet-stream [following]
--2023-07-30 17:21:24--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/9524057/2ae54580-afed-11ea-9edf-c8e21bb074f9?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230730%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230730T082124Z&X-Amz-Expires=300&X-Amz-Signature=8366a9db94e2d8b94fb6ebf6d935a9cb19d3ae4b7e3171a2564de05292c22443&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=9524057&response-content-disposition=attachment%3B%20filename%3Dnode_exporter-1.0.1.linux-386.tar.gz&response-content-type=application%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9284696 (8.9M) [application/octet-stream]
Saving to: ‘node_exporter-1.0.1.linux-386.tar.gz’

100%[======================================================================================================================================================================================================================================>] 9,284,696   16.3MB/s   in 0.5s

2023-07-30 17:21:25 (16.3 MB/s) - ‘node_exporter-1.0.1.linux-386.tar.gz’ saved [9284696/9284696]
[ec2-user@ip-172-31-4-136 ~]$ sudo tar zxf node_exporter-1.0.1.linux-386.tar.gz -C /usr/local/ # 압축 해제
[ec2-user@ip-172-31-4-136 ~]$ sudo ln -s /usr/local/node_exporter-1.0.1.linux-386 /usr/local/node_exporter # 심볼릭 링크 설정
[ec2-user@ip-172-31-4-136 ~]$ sudo cp kafka2/chapter7/node-exporter.service /etc/systemd/system
[ec2-user@ip-172-31-4-136 ~]$ sudo systemctl daemon-reload

[ec2-user@ip-172-31-4-136 ~]$ sudo systemctl start node-exporter
[ec2-user@ip-172-31-4-136 ~]$ sudo systemctl status node-exporter
● node-exporter.service - Node Exporter
   Loaded: loaded (/etc/systemd/system/node-exporter.service; disabled; vendor preset: disabled)
   Active: active (running) since 일 2023-07-30 17:23:24 KST; 4s ago
 Main PID: 10098 (node_exporter)
   CGroup: /system.slice/node-exporter.service
           └─10098 /usr/local/node_exporter/node_exporter

 7월 30 17:23:24 ip-172-31-4-136.ap-northeast-2.compute.internal node_exporter[10098]: level=info ts=2023-07-30T08:23:24.412Z caller=node_exporter.go:112 collector=thermal_zone
 7월 30 17:23:24 ip-172-31-4-136.ap-northeast-2.compute.internal node_exporter[10098]: level=info ts=2023-07-30T08:23:24.412Z caller=node_exporter.go:112 collector=time
 7월 30 17:23:24 ip-172-31-4-136.ap-northeast-2.compute.internal node_exporter[10098]: level=info ts=2023-07-30T08:23:24.412Z caller=node_exporter.go:112 collector=timex
 7월 30 17:23:24 ip-172-31-4-136.ap-northeast-2.compute.internal node_exporter[10098]: level=info ts=2023-07-30T08:23:24.413Z caller=node_exporter.go:112 collector=udp_queues
 7월 30 17:23:24 ip-172-31-4-136.ap-northeast-2.compute.internal node_exporter[10098]: level=info ts=2023-07-30T08:23:24.413Z caller=node_exporter.go:112 collector=uname
 7월 30 17:23:24 ip-172-31-4-136.ap-northeast-2.compute.internal node_exporter[10098]: level=info ts=2023-07-30T08:23:24.413Z caller=node_exporter.go:112 collector=vmstat
 7월 30 17:23:24 ip-172-31-4-136.ap-northeast-2.compute.internal node_exporter[10098]: level=info ts=2023-07-30T08:23:24.413Z caller=node_exporter.go:112 collector=xfs
 7월 30 17:23:24 ip-172-31-4-136.ap-northeast-2.compute.internal node_exporter[10098]: level=info ts=2023-07-30T08:23:24.413Z caller=node_exporter.go:112 collector=zfs
 7월 30 17:23:24 ip-172-31-4-136.ap-northeast-2.compute.internal node_exporter[10098]: level=info ts=2023-07-30T08:23:24.413Z caller=node_exporter.go:191 msg="Listening on" address=:9100
 7월 30 17:23:24 ip-172-31-4-136.ap-northeast-2.compute.internal node_exporter[10098]: level=info ts=2023-07-30T08:23:24.413Z caller=tls_config.go:170 msg="TLS is disabled and it cannot be enabled on the fly." http2=false

이것도 kakfa01~03 서버 전부 설치

프로메테우스 설정파일 분석

[ec2-user@ip-172-31-2-254 ~]$ cat /etc/prometheus/prometheus.yml
# prometheus config
global: # 프로메테우스 전반적인 설정
  scrape_interval:     5s
  evaluation_interval: 5s

scrape_configs:
  - job_name: 'peter-jmx-kafka' # 프로메테우스에서 메트릭을 수집할 대상을 설정하는 부분, job의 이름과 타깃의 정보를 입력, JMX 익스포터에 대한 항목
    static_configs:
      - targets:
        - peter-kafka01.foo.bar:7071
        - peter-kafka02.foo.bar:7071
        - peter-kafka03.foo.bar:7071

  - job_name: 'peter-kafka-nodes' # 프로메테우스에서 메트릭을 수집할 대상을 설정하는 부분, 노드 익스포터에 대한 항목
    static_configs:
      - targets:
          - peter-kafka01.foo.bar:9100
          - peter-kafka02.foo.bar:9100
          - peter-kafka03.foo.bar:9100

  - job_name: 'peter-kafka-exporter'
    static_configs:
      - targets:
          - peter-kafka01.foo.bar:9308
          - peter-kafka02.foo.bar:9308
          - peter-kafka03.foo.bar:9308[ec2-user@ip-172-31-2-254 ~]$
image image image
28 package(s) needed for security, out of 30 available
Run "sudo yum update" to apply all updates.
[ec2-user@ip-172-31-4-136 ~]$ cat kafka2/chapter7/kafka_metrics.json
내용 너무 많아서 생략

JMX 모니터링 지표

카프카 익스포터