instana / onprem-distkit

A collection of configurations that may be useful for operators of distributed Instana deployments
4 stars 2 forks source link

Release-207 intergration test:Kafka error in Instana operator pod log launched #27

Closed yaoyao12138 closed 3 years ago

yaoyao12138 commented 3 years ago

In the Instana env, there is only the instana operator pod launched.
From the log, there is the Kafka error.

 ts=2021-10-19T10:03:47.824978645Z level=error logger=migration msg="Kafka (kafka): could not connect to kafka" stacktrace="reflect.Value.call\n\t/usr/local/go/src/reflect/value.go:543\nreflect.Value.Call\n\t/usr/local/go/src/reflect/value.go:339\ngithub.com/chuckpreslar/emission.(*Emitter).Emit.func1\n\t/tmp/build/0c511968/repo/vendor/github.com/chuckpreslar/emission/emitter.go:209"
ts=2021-10-19T10:03:47.825058355Z level=error logger=controller-runtime.manager.controller.core msg="reconcile error" reconcilergroup=instana.io reconcilerkind=Core name=instana-core namespace=instana-core error="kafka: client has run out of available brokers to talk to (Is your cluster reachable?)" stacktrace="github.com/instana/infrastructure/self-hosted-k8s-operator/operator/controllers.(*CoreReconciler).Reconcile\n\t/tmp/build/0c511968/repo/self-hosted-k8s-operator/operator/controllers/core_controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/tmp/build/0c511968/repo/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/tmp/build/0c511968/repo/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/tmp/build/0c511968/repo/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214"
ts=2021-10-19T10:03:47.836808672Z level=info logger=controller-runtime.manager.controller.core msg="status updated successfully" reconcilergroup=instana.io reconcilerkind=Core name=instana-core namespace=instana-core

In the db env of Kafka, the service is ruuning but it seems re-start repeatly.

root@yao-instana-vm-32-instana-db:~# systemctl status kafka
● kafka.service - Kafka Server
     Loaded: loaded (/lib/systemd/system/kafka.service; disabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kafka.service.d
             └─override.conf
     Active: active (running) since Tue 2021-10-19 03:03:26 PDT; 2s ago
   Main PID: 1334647 (java)
      Tasks: 36 (limit: 38431)
     Memory: 89.8M
     CGroup: /system.slice/kafka.service
             └─1334647 /opt/instana/jre11/bin/java -Xmx256M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.aw>

Oct 19 03:03:28 yao-instana-vm-32-instana-db.fyre.ibm.com kafka[1334647]:         at kafka.log.LogManager.$anonfun$recoveryPointCheckpoints$1(LogManager.scala:107)
Oct 19 03:03:28 yao-instana-vm-32-instana-db.fyre.ibm.com kafka[1334647]:         at scala.collection.StrictOptimizedIterableOps.map(StrictOptimizedIterableOps.scala:99)
Oct 19 03:03:28 yao-instana-vm-32-instana-db.fyre.ibm.com kafka[1334647]:         at scala.collection.StrictOptimizedIterableOps.map$(StrictOptimizedIterableOps.scala:86)
Oct 19 03:03:28 yao-instana-vm-32-instana-db.fyre.ibm.com kafka[1334647]:         at scala.collection.mutable.ArraySeq.map(ArraySeq.scala:38)
Oct 19 03:03:28 yao-instana-vm-32-instana-db.fyre.ibm.com kafka[1334647]:         at kafka.log.LogManager.<init>(LogManager.scala:106)
Oct 19 03:03:28 yao-instana-vm-32-instana-db.fyre.ibm.com kafka[1334647]:         at kafka.log.LogManager$.apply(LogManager.scala:1212)
Oct 19 03:03:28 yao-instana-vm-32-instana-db.fyre.ibm.com kafka[1334647]:         at kafka.server.KafkaServer.startup(KafkaServer.scala:290)
Oct 19 03:03:28 yao-instana-vm-32-instana-db.fyre.ibm.com kafka[1334647]:         at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
Oct 19 03:03:28 yao-instana-vm-32-instana-db.fyre.ibm.com kafka[1334647]:         at kafka.Kafka$.main(Kafka.scala:82)
Oct 19 03:03:28 yao-instana-vm-32-instana-db.fyre.ibm.com kafka[1334647]:         at kafka.Kafka.main(Kafka.scala)

root@yao-instana-vm-32-instana-db:~# vi /var/log/syslog

Oct 19 03:27:24 yao-instana-vm-32-instana-db kafka[1457135]: [2021-10-19 03:27:24,439] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
Oct 19 03:27:24 yao-instana-vm-32-instana-db systemd[1]: kafka.service: Main process exited, code=exited, status=1/FAILURE
Oct 19 03:27:24 yao-instana-vm-32-instana-db systemd[1]: kafka.service: Failed with result 'exit-code'.
Oct 19 03:27:25 yao-instana-vm-32-instana-db systemd[1]: kafka.service: Scheduled restart job, restart counter is at 27148.
Oct 19 03:27:25 yao-instana-vm-32-instana-db systemd[1]: Stopped Kafka Server.
Oct 19 03:27:25 yao-instana-vm-32-instana-db systemd[1]: Started Kafka Server.
Oct 19 03:27:26 yao-instana-vm-32-instana-db zkServer.sh[116275]: 2021-10-19 03:27:26,441 [myid:2] - INFO  [CommitProcessor:2:LeaderSessionTracker@104] - Committing global session 0x30068cf2bd95331
Oct 19 03:27:27 yao-instana-vm-32-instana-db zkServer.sh[116275]: 2021-10-19 03:27:27,276 [myid:2] - INFO  [CommitProcessor:2:LeaderSessionTracker@104] - Committing global session 0x30068cf2bd95332
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: [2021-10-19 03:27:27,571] WARN No meta.properties file under dir /mnt/data/kafka/meta.properties (kafka.server.BrokerMetadataCheckpoint)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: [2021-10-19 03:27:27,740] ERROR Disk error while locking directory /mnt/data/kafka (kafka.server.LogDirFailureChannel)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: java.nio.file.AccessDeniedException: /mnt/data/kafka/.lock
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at kafka.utils.FileLock.<init>(FileLock.scala:31)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at kafka.log.LogManager.$anonfun$lockLogDirs$1(LogManager.scala:235)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at scala.collection.StrictOptimizedIterableOps.flatMap(StrictOptimizedIterableOps.scala:117)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at scala.collection.StrictOptimizedIterableOps.flatMap$(StrictOptimizedIterableOps.scala:104)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at scala.collection.mutable.ArraySeq.flatMap(ArraySeq.scala:38)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at kafka.log.LogManager.lockLogDirs(LogManager.scala:233)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at kafka.log.LogManager.<init>(LogManager.scala:105)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at kafka.log.LogManager$.apply(LogManager.scala:1212)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at kafka.server.KafkaServer.startup(KafkaServer.scala:290)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at kafka.Kafka$.main(Kafka.scala:82)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at kafka.Kafka.main(Kafka.scala)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: [2021-10-19 03:27:27,750] ERROR [KafkaServer id=1] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: java.nio.file.AccessDeniedException: /mnt/data/kafka/recovery-point-offset-checkpoint
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
Oct 19 03:27:27 yao-instana-vm-32-instana-db kafka[1457489]: #011at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
BonnieKris commented 3 years ago
  1. run chown kafka:kafka /mnt/data/kafka on both nodes.
  2. changed broker.id =2 in /etc/kafka/server.properties on 9.30.246.11, make the broker.id is a unique integer for each broker.
  3. restart kafka service

So far it is working fine as follows:

root@yao-instana-vm-32-2-instana-db:/etc/kafka# systemctl status kafka
● kafka.service - Kafka Server
     Loaded: loaded (/lib/systemd/system/kafka.service; disabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kafka.service.d
             └─override.conf
     Active: active (running) since Tue 2021-10-19 19:56:14 PDT; 19min ago
   Main PID: 1404013 (java)
      Tasks: 80 (limit: 38431)
     Memory: 282.2M
     CGroup: /system.slice/kafka.service
             └─1404013 /opt/instana/jre11/bin/java -Xmx256M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Dcom.sun.m>

Oct 19 19:56:14 yao-instana-vm-32-2-instana-db.fyre.ibm.com systemd[1]: Started Kafka Server.

@yaoyao12138

yaoyao12138 commented 3 years ago

Thanks for your effort, the kafka could work normally on both 2 nodes.
Also, in the instana operator log, it shows that the kafka is healthy.

ts=2021-10-20T03:51:32.078913816Z level=info logger=migration msg="Kafka (kafka): is healthy"