odpi / egeria-charts

Helm chart repository
https://odpi.github.io/egeria-charts
Apache License 2.0
13 stars 9 forks source link

Default odpi-egeria-lab chart install doesn't work on local Rancher deskop #263

Closed lpalashevski closed 1 year ago

lpalashevski commented 1 year ago

Problem: odpi-egeria-lab chart installation is not complete when running under local kubernetes provided by Rancher desktop environment.

Due to technical error (details below) Kafka broker is not available for the Egeria applications preventing the user to run/test using this configuration.

To reproduce install (any) of v4 releases on

Rancher Desktop v1.7.0 Apple M1 Pro MacOs Ventura 13.3.1 (a)

> helm install lab egeria/odpi-egeria-lab --set jupyter.tokenPlain='secret' --set imageDefaults.pullPolicy=Always

This will install the chart successfully

> helm list
NAME    NAMESPACE   REVISION    UPDATED                                 STATUS      CHART                   APP VERSION
lab     default     1           2023-05-09 15:15:01.028401 +0200 CEST   deployed    odpi-egeria-lab-4.0.1   4.0

however, strimzi operator will fail to bring up zookeeper relicas (subsequently kafka) getting into loop

here are some details from k8s env:

NAME                                                    READY   STATUS             RESTARTS      AGE
pod/lab-strimzi-zookeeper-1                             0/1     CrashLoopBackOff   5 (67s ago)   5m15s
pod/lab-strimzi-zookeeper-0                             0/1     CrashLoopBackOff   5 (65s ago)   5m15s

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  6m9s                   default-scheduler  Successfully assigned default/lab-strimzi-zookeeper-1 to lima-rancher-desktop
  Warning  Unhealthy  4m47s (x6 over 5m47s)  kubelet            Readiness probe failed:
  Warning  Unhealthy  4m47s (x6 over 5m47s)  kubelet            Liveness probe failed:
  Normal   Killing    4m47s (x2 over 5m27s)  kubelet            Container zookeeper failed liveness probe, will be restarted
  Warning  Unhealthy  4m47s (x2 over 5m27s)  kubelet            Readiness probe failed: Ncat: Connection reset by peer.
  Normal   Pulled     4m47s (x3 over 6m7s)   kubelet            Container image "quay.io/strimzi/kafka:0.34.0-kafka-3.4.0" already present on machine
  Normal   Created    4m47s (x3 over 6m7s)   kubelet            Created container zookeeper
  Normal   Started    4m47s (x3 over 6m7s)   kubelet            Started container zookeeper
  Warning  BackOff    57s (x7 over 2m7s)     kubelet            Back-off restarting failed container

Digging deeper looks like this is caused by internal k8s networking/socker/ssl handshake error between zookeper instances, preventing startup and readiness of the pods. This failure was not investigated in details (will attach pod logs in separate comment for further investigation).

Reverting the default chart properties to instruct strimzi operator to install single instance (I think this was the case before) resolves the problem.

> helm install lab egeria/odpi-egeria-lab --set jupyter.tokenPlain='secret' --set imageDefaults.pullPolicy=Always --set kafka.replicas=1 --set zookeeper.replicas=1

Then all pods are up and running:

NAME                                                    READY   STATUS    RESTARTS   AGE
pod/lab-odpi-egeria-lab-presentation-568845b6c9-pw2p6   1/1     Running   0          47m
pod/lab-odpi-egeria-lab-uistatic-58448fccf9-gdqdb       1/1     Running   0          47m
pod/lab-odpi-egeria-lab-ui-645d5b6f9-trcx2              1/1     Running   0          47m
pod/lab-odpi-egeria-lab-factory-0                       1/1     Running   0          47m
pod/lab-odpi-egeria-lab-core-0                          1/1     Running   0          47m
pod/lab-odpi-egeria-lab-datalake-0                      1/1     Running   0          47m
pod/strimzi-cluster-operator-675749c788-mplk9           1/1     Running   0          47m
pod/lab-odpi-egeria-lab-dev-0                           1/1     Running   0          47m
pod/lab-strimzi-zookeeper-0                             1/1     Running   0          47m
pod/lab-odpi-egeria-lab-jupyter-557658b746-v2jfl        1/1     Running   0          47m
pod/lab-strimzi-kafka-0                                 1/1     Running   0          47m
pod/lab-strimzi-entity-operator-db897fc6c-nt564         3/3     Running   0          46m
lpalashevski commented 1 year ago
k logs pod/lab-strimzi-zookeeper-1 -f
Detected Zookeeper ID 2
Preparing truststore
Adding /opt/kafka/cluster-ca-certs/ca.crt to truststore /tmp/zookeeper/cluster.truststore.p12 with alias ca
Certificate was added to keystore
Preparing truststore is complete
Looking for the right CA
Found the right CA: /opt/kafka/cluster-ca-certs/ca.crt
Preparing keystore for client and quorum listeners
Preparing keystore for client and quorum listeners is complete
Starting Zookeeper with configuration:
# The directory where the snapshot is stored.
dataDir=/var/lib/zookeeper/data

# Other options
4lw.commands.whitelist=*
standaloneEnabled=false
reconfigEnabled=true
clientPort=12181
clientPortAddress=127.0.0.1

# TLS options
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
ssl.clientAuth=need
ssl.quorum.clientAuth=need
secureClientPort=2181
sslQuorum=true

ssl.trustStore.location=/tmp/zookeeper/cluster.truststore.p12
ssl.trustStore.password=[hidden]
ssl.trustStore.type=PKCS12
ssl.quorum.trustStore.location=/tmp/zookeeper/cluster.truststore.p12
ssl.quorum.trustStore.password=[hidden]
ssl.quorum.trustStore.type=PKCS12

ssl.keyStore.location=/tmp/zookeeper/cluster.keystore.p12
ssl.keyStore.password=[hidden]
ssl.keyStore.type=PKCS12
ssl.quorum.keyStore.location=/tmp/zookeeper/cluster.keystore.p12
ssl.quorum.keyStore.password=[hidden]
ssl.quorum.keyStore.type=PKCS12

# Provided configuration
tickTime=2000
initLimit=5
syncLimit=2
autopurge.purgeInterval=1
admin.enableServer=false

# Zookeeper nodes configuration
server.1=lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc:2888:3888:participant;127.0.0.1:12181
server.2=lab-strimzi-zookeeper-1.lab-strimzi-zookeeper-nodes.default.svc:2888:3888:participant;127.0.0.1:12181

+ exec /usr/bin/tini -w -e 143 -- /opt/kafka/bin/zookeeper-server-start.sh /tmp/zookeeper.properties
2023-05-09 10:37:16,808 INFO Reading configuration from: /tmp/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig) [main]
2023-05-09 10:37:16,809 INFO clientPortAddress is 127.0.0.1:12181 (org.apache.zookeeper.server.quorum.QuorumPeerConfig) [main]
2023-05-09 10:37:16,809 INFO secureClientPortAddress is 0.0.0.0:2181 (org.apache.zookeeper.server.quorum.QuorumPeerConfig) [main]
2023-05-09 10:37:16,813 INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util) [main]
2023-05-09 10:37:16,813 INFO observerMasterPort is not set (org.apache.zookeeper.server.quorum.QuorumPeerConfig) [main]
2023-05-09 10:37:16,813 INFO metricsProvider.className is org.apache.zookeeper.metrics.impl.DefaultMetricsProvider (org.apache.zookeeper.server.quorum.QuorumPeerConfig) [main]
2023-05-09 10:37:16,818 WARN No server failure will be tolerated. You need at least 3 servers. (org.apache.zookeeper.server.quorum.QuorumPeerConfig) [main]
2023-05-09 10:37:16,820 INFO autopurge.snapRetainCount set to 3 (org.apache.zookeeper.server.DatadirCleanupManager) [main]
2023-05-09 10:37:16,821 INFO autopurge.purgeInterval set to 1 (org.apache.zookeeper.server.DatadirCleanupManager) [main]
2023-05-09 10:37:16,821 INFO Purge task started. (org.apache.zookeeper.server.DatadirCleanupManager) [PurgeTask]
2023-05-09 10:37:16,822 INFO Log4j 1.2 jmx support not found; jmx disabled. (org.apache.zookeeper.jmx.ManagedUtil) [main]
2023-05-09 10:37:16,822 INFO Starting quorum peer, myid=2 (org.apache.zookeeper.server.quorum.QuorumPeerMain) [main]
2023-05-09 10:37:16,824 INFO zookeeper.snapshot.trust.empty : false (org.apache.zookeeper.server.persistence.FileTxnSnapLog) [PurgeTask]
2023-05-09 10:37:16,827 INFO zookeeper.snapshot.compression.method = CHECKED (org.apache.zookeeper.server.persistence.SnapStream) [PurgeTask]
2023-05-09 10:37:16,828 INFO Purge task completed. (org.apache.zookeeper.server.DatadirCleanupManager) [PurgeTask]
2023-05-09 10:37:16,830 INFO ServerMetrics initialized with provider org.apache.zookeeper.metrics.impl.DefaultMetricsProvider@7219ec67 (org.apache.zookeeper.server.ServerMetrics) [main]
2023-05-09 10:37:16,862 INFO zookeeper.client.portUnification=false (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
2023-05-09 10:37:16,862 INFO zookeeper.netty.advancedFlowControl.enabled = false (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
2023-05-09 10:37:16,863 INFO handshakeThrottlingEnabled = false, zookeeper.netty.server.outstandingHandshake.limit = -1 (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
2023-05-09 10:37:16,879 INFO Using org.apache.zookeeper.server.NettyServerCnxnFactory as server connection factory (org.apache.zookeeper.server.ServerCnxnFactory) [main]
2023-05-09 10:37:16,879 WARN maxCnxns is not configured, using default value 0. (org.apache.zookeeper.server.ServerCnxnFactory) [main]
2023-05-09 10:37:16,879 INFO zookeeper.client.portUnification=false (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
2023-05-09 10:37:16,879 INFO zookeeper.netty.advancedFlowControl.enabled = false (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
2023-05-09 10:37:16,879 INFO handshakeThrottlingEnabled = false, zookeeper.netty.server.outstandingHandshake.limit = -1 (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
2023-05-09 10:37:16,880 INFO Using org.apache.zookeeper.server.NettyServerCnxnFactory as server connection factory (org.apache.zookeeper.server.ServerCnxnFactory) [main]
2023-05-09 10:37:16,880 WARN maxCnxns is not configured, using default value 0. (org.apache.zookeeper.server.ServerCnxnFactory) [main]
2023-05-09 10:37:16,882 INFO zookeeper.quorumCnxnTimeoutMs=-1 (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,885 INFO zookeeper.snapshot.trust.empty : false (org.apache.zookeeper.server.persistence.FileTxnSnapLog) [main]
2023-05-09 10:37:16,885 INFO Local sessions disabled (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,885 INFO Local session upgrading disabled (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,885 INFO tickTime set to 2000 (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,885 INFO minSessionTimeout set to 4000 (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,885 INFO maxSessionTimeout set to 40000 (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,885 INFO initLimit set to 5 (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,885 INFO syncLimit set to 2 (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,885 INFO connectToLearnerMasterLimit set to 0 (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,890 INFO  (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,890 INFO   ______                  _                                           (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,890 INFO  |___  /                 | |                                          (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,890 INFO     / /    ___     ___   | | __   ___    ___   _ __     ___   _ __    (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,890 INFO    / /    / _ \   / _ \  | |/ /  / _ \  / _ \ | '_ \   / _ \ | '__| (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,890 INFO   / /__  | (_) | | (_) | |   <  |  __/ |  __/ | |_) | |  __/ | |     (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,890 INFO  /_____|  \___/   \___/  |_|\_\  \___|  \___| | .__/   \___| |_| (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,890 INFO                                               | |                      (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,890 INFO                                               |_|                      (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,890 INFO  (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,890 INFO Server environment:zookeeper.version=3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:host.name=lab-strimzi-zookeeper-1.lab-strimzi-zookeeper-nodes.default.svc.cluster.local (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:java.version=17.0.6 (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:java.vendor=Red Hat, Inc. (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:java.home=/usr/lib/jvm/java-17-openjdk-17.0.6.0.10-3.el8_7.aarch64 (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:java.class.path=/opt/kafka/bin/../libs/accessors-smart-2.4.7.jar:/opt/kafka/bin/../libs/activation-1.1.1.jar:/opt/kafka/bin/../libs/annotations-13.0.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.6.1.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka/bin/../libs/checker-qual-3.5.0.jar:/opt/kafka/bin/../libs/commons-cli-1.4.jar:/opt/kafka/bin/../libs/commons-lang-2.6.jar:/opt/kafka/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka/bin/../libs/connect-api-3.4.0.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-3.4.0.jar:/opt/kafka/bin/../libs/connect-json-3.4.0.jar:/opt/kafka/bin/../libs/connect-mirror-3.4.0.jar:/opt/kafka/bin/../libs/connect-mirror-client-3.4.0.jar:/opt/kafka/bin/../libs/connect-runtime-3.4.0.jar:/opt/kafka/bin/../libs/connect-transforms-3.4.0.jar:/opt/kafka/bin/../libs/cruise-control-metrics-reporter-2.5.112.jar:/opt/kafka/bin/../libs/error_prone_annotations-2.3.4.jar:/opt/kafka/bin/../libs/failureaccess-1.0.1.jar:/opt/kafka/bin/../libs/gson-2.9.0.jar:/opt/kafka/bin/../libs/guava-30.1-jre.jar:/opt/kafka/bin/../libs/hk2-api-2.6.1.jar:/opt/kafka/bin/../libs/hk2-locator-2.6.1.jar:/opt/kafka/bin/../libs/hk2-utils-2.6.1.jar:/opt/kafka/bin/../libs/j2objc-annotations-1.3.jar:/opt/kafka/bin/../libs/jackson-annotations-2.13.4.jar:/opt/kafka/bin/../libs/jackson-core-2.13.4.jar:/opt/kafka/bin/../libs/jackson-databind-2.13.4.2.jar:/opt/kafka/bin/../libs/jackson-dataformat-csv-2.13.4.jar:/opt/kafka/bin/../libs/jackson-dataformat-yaml-2.13.4.jar:/opt/kafka/bin/../libs/jackson-datatype-jdk8-2.13.4.jar:/opt/kafka/bin/../libs/jackson-datatype-jsr310-2.13.4.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.13.4.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.13.4.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.13.4.jar:/opt/kafka/bin/../libs/jackson-module-scala_2.13-2.13.4.jar:/opt/kafka/bin/../libs/jaeger-client-1.8.1.jar:/opt/kafka/bin/../libs/jaeger-core-1.8.1.jar:/opt/kafka/bin/../libs/jaeger-thrift-1.8.1.jar:/opt/kafka/bin/../libs/jaeger-tracerresolver-1.8.1.jar:/opt/kafka/bin/../libs/jakarta.activation-api-1.2.2.jar:/opt/kafka/bin/../libs/jakarta.annotation-api-1.3.5.jar:/opt/kafka/bin/../libs/jakarta.inject-2.6.1.jar:/opt/kafka/bin/../libs/jakarta.validation-api-2.0.2.jar:/opt/kafka/bin/../libs/jakarta.ws.rs-api-2.1.6.jar:/opt/kafka/bin/../libs/jakarta.xml.bind-api-2.3.3.jar:/opt/kafka/bin/../libs/javassist-3.27.0-GA.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.3.2.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-2.34.jar:/opt/kafka/bin/../libs/jersey-common-2.34.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.34.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.34.jar:/opt/kafka/bin/../libs/jersey-hk2-2.34.jar:/opt/kafka/bin/../libs/jersey-server-2.34.jar:/opt/kafka/bin/../libs/jetty-client-9.4.48.v20220622.jar:/opt/kafka/bin/../libs/jetty-continuation-9.4.48.v20220622.jar:/opt/kafka/bin/../libs/jetty-http-9.4.48.v20220622.jar:/opt/kafka/bin/../libs/jetty-io-9.4.48.v20220622.jar:/opt/kafka/bin/../libs/jetty-security-9.4.48.v20220622.jar:/opt/kafka/bin/../libs/jetty-server-9.4.48.v20220622.jar:/opt/kafka/bin/../libs/jetty-servlet-9.4.48.v20220622.jar:/opt/kafka/bin/../libs/jetty-servlets-9.4.48.v20220622.jar:/opt/kafka/bin/../libs/jetty-util-9.4.48.v20220622.jar:/opt/kafka/bin/../libs/jetty-util-ajax-9.4.48.v20220622.jar:/opt/kafka/bin/../libs/jline-3.21.0.jar:/opt/kafka/bin/../libs/jmx_prometheus_javaagent-0.17.2.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka/bin/../libs/jose4j-0.7.9.jar:/opt/kafka/bin/../libs/json-path-2.6.0.jar:/opt/kafka/bin/../libs/json-smart-2.4.7.jar:/opt/kafka/bin/../libs/jsonevent-layout-1.7.jar:/opt/kafka/bin/../libs/jsr305-3.0.2.jar:/opt/kafka/bin/../libs/kafka-agent-0.34.0.jar:/opt/kafka/bin/../libs/kafka-clients-3.4.0.jar:/opt/kafka/bin/../libs/kafka-env-var-config-provider-1.1.0.jar:/opt/kafka/bin/../libs/kafka-group-coordinator-3.4.0.jar:/opt/kafka/bin/../libs/kafka-kubernetes-config-provider-1.1.0.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-3.4.0.jar:/opt/kafka/bin/../libs/kafka-metadata-3.4.0.jar:/opt/kafka/bin/../libs/kafka-oauth-client-0.12.0.jar:/opt/kafka/bin/../libs/kafka-oauth-common-0.12.0.jar:/opt/kafka/bin/../libs/kafka-oauth-keycloak-authorizer-0.12.0.jar:/opt/kafka/bin/../libs/kafka-oauth-server-0.12.0.jar:/opt/kafka/bin/../libs/kafka-oauth-server-plain-0.12.0.jar:/opt/kafka/bin/../libs/kafka-quotas-plugin-0.2.0.jar:/opt/kafka/bin/../libs/kafka-raft-3.4.0.jar:/opt/kafka/bin/../libs/kafka-server-common-3.4.0.jar:/opt/kafka/bin/../libs/kafka-shell-3.4.0.jar:/opt/kafka/bin/../libs/kafka-storage-3.4.0.jar:/opt/kafka/bin/../libs/kafka-storage-api-3.4.0.jar:/opt/kafka/bin/../libs/kafka-streams-3.4.0.jar:/opt/kafka/bin/../libs/kafka-streams-examples-3.4.0.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.13-3.4.0.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-3.4.0.jar:/opt/kafka/bin/../libs/kafka-tools-3.4.0.jar:/opt/kafka/bin/../libs/kafka_2.13-3.4.0.jar:/opt/kafka/bin/../libs/kotlin-stdlib-1.4.10.jar:/opt/kafka/bin/../libs/kotlin-stdlib-common-1.4.0.jar:/opt/kafka/bin/../libs/kubernetes-client-6.2.0.jar:/opt/kafka/bin/../libs/kubernetes-client-api-6.2.0.jar:/opt/kafka/bin/../libs/kubernetes-httpclient-okhttp-6.2.0.jar:/opt/kafka/bin/../libs/kubernetes-model-apps-6.2.0.jar:/opt/kafka/bin/../libs/kubernetes-model-batch-6.2.0.jar:/opt/kafka/bin/../libs/kubernetes-model-certificates-6.2.0.jar:/opt/kafka/bin/../libs/kubernetes-model-common-6.2.0.jar:/opt/kafka/bin/../libs/kubernetes-model-core-6.2.0.jar:/opt/kafka/bin/../libs/kubernetes-model-extensions-6.2.0.jar:/opt/kafka/bin/../libs/kubernetes-model-gatewayapi-6.2.0.jar:/opt/kafka/bin/../libs/libthrift-0.15.0.jar:/opt/kafka/bin/../libs/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/opt/kafka/bin/../libs/logging-interceptor-3.12.12.jar:/opt/kafka/bin/../libs/lz4-java-1.8.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.8.4.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/metrics-core-4.1.12.1.jar:/opt/kafka/bin/../libs/mirror-maker-2-extensions-1.2.0.jar:/opt/kafka/bin/../libs/mirror-maker-agent-0.34.0.jar:/opt/kafka/bin/../libs/netty-buffer-4.1.78.Final.jar:/opt/kafka/bin/../libs/netty-codec-4.1.78.Final.jar:/opt/kafka/bin/../libs/netty-common-4.1.78.Final.jar:/opt/kafka/bin/../libs/netty-handler-4.1.78.Final.jar:/opt/kafka/bin/../libs/netty-resolver-4.1.78.Final.jar:/opt/kafka/bin/../libs/netty-transport-4.1.78.Final.jar:/opt/kafka/bin/../libs/netty-transport-classes-epoll-4.1.78.Final.jar:/opt/kafka/bin/../libs/netty-transport-native-epoll-4.1.78.Final.jar:/opt/kafka/bin/../libs/netty-transport-native-unix-common-4.1.78.Final.jar:/opt/kafka/bin/../libs/nimbus-jose-jwt-9.10.jar:/opt/kafka/bin/../libs/okhttp-4.9.3.jar:/opt/kafka/bin/../libs/okio-2.8.0.jar:/opt/kafka/bin/../libs/opa-authorizer-1.5.1.jar:/opt/kafka/bin/../libs/opentelemetry-api-1.18.0.jar:/opt/kafka/bin/../libs/opentelemetry-context-1.18.0.jar:/opt/kafka/bin/../libs/opentelemetry-exporter-common-1.18.0.jar:/opt/kafka/bin/../libs/opentelemetry-exporter-otlp-1.18.0.jar:/opt/kafka/bin/../libs/opentelemetry-exporter-otlp-common-1.18.0.jar:/opt/kafka/bin/../libs/opentelemetry-instrumentation-api-1.18.0.jar:/opt/kafka/bin/../libs/opentelemetry-instrumentation-api-semconv-1.18.0-alpha.jar:/opt/kafka/bin/../libs/opentelemetry-kafka-clients-2.6-1.18.0-alpha.jar:/opt/kafka/bin/../libs/opentelemetry-kafka-clients-common-1.18.0-alpha.jar:/opt/kafka/bin/../libs/opentelemetry-sdk-1.18.0.jar:/opt/kafka/bin/../libs/opentelemetry-sdk-common-1.18.0.jar:/opt/kafka/bin/../libs/opentelemetry-sdk-extension-autoconfigure-1.18.0-alpha.jar:/opt/kafka/bin/../libs/opentelemetry-sdk-extension-autoconfigure-spi-1.18.0.jar:/opt/kafka/bin/../libs/opentelemetry-sdk-logs-1.18.0-alpha.jar:/opt/kafka/bin/../libs/opentelemetry-sdk-metrics-1.18.0.jar:/opt/kafka/bin/../libs/opentelemetry-sdk-trace-1.18.0.jar:/opt/kafka/bin/../libs/opentelemetry-semconv-1.18.0-alpha.jar:/opt/kafka/bin/../libs/opentracing-api-0.33.0.jar:/opt/kafka/bin/../libs/opentracing-kafka-client-0.1.15.jar:/opt/kafka/bin/../libs/opentracing-noop-0.33.0.jar:/opt/kafka/bin/../libs/opentracing-tracerresolver-0.1.8.jar:/opt/kafka/bin/../libs/opentracing-util-0.33.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.3.jar:/opt/kafka/bin/../libs/paranamer-2.8.jar:/opt/kafka/bin/../libs/plexus-utils-3.3.0.jar:/opt/kafka/bin/../libs/reflections-0.9.12.jar:/opt/kafka/bin/../libs/reload4j-1.2.19.jar:/opt/kafka/bin/../libs/rocksdbjni-7.1.2.jar:/opt/kafka/bin/../libs/scala-collection-compat_2.13-2.6.0.jar:/opt/kafka/bin/../libs/scala-java8-compat_2.13-1.0.2.jar:/opt/kafka/bin/../libs/scala-library-2.13.10.jar:/opt/kafka/bin/../libs/scala-logging_2.13-3.9.4.jar:/opt/kafka/bin/../libs/scala-reflect-2.13.10.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.30.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.36.jar:/opt/kafka/bin/../libs/slf4j-reload4j-1.7.36.jar:/opt/kafka/bin/../libs/snakeyaml-1.33.jar:/opt/kafka/bin/../libs/snappy-java-1.1.8.4.jar:/opt/kafka/bin/../libs/swagger-annotations-2.2.0.jar:/opt/kafka/bin/../libs/tracing-agent-0.34.0.jar:/opt/kafka/bin/../libs/trogdor-3.4.0.jar:/opt/kafka/bin/../libs/zjsonpatch-0.3.0.jar:/opt/kafka/bin/../libs/zookeeper-3.6.3.jar:/opt/kafka/bin/../libs/zookeeper-jute-3.6.3.jar:/opt/kafka/bin/../libs/zstd-jni-1.5.2-1.jar (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:java.io.tmpdir=/tmp (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:java.compiler=<NA> (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:os.name=Linux (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:os.arch=aarch64 (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:os.version=5.15.78-0-virt (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:user.name=kafka (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:user.home=/home/kafka (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:user.dir=/opt/kafka (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:os.memory.free=164MB (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:os.memory.max=2988MB (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO Server environment:os.memory.total=176MB (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO zookeeper.enableEagerACLCheck = false (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO zookeeper.skipACL=="yes", ACL checks will be skipped (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO zookeeper.digest.enabled = true (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO zookeeper.closeSessionTxn.enabled = true (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO zookeeper.flushDelay=0 (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO zookeeper.maxWriteQueuePollTime=0 (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO zookeeper.maxBatchSize=1000 (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,891 INFO zookeeper.intBufferStartingSizeBytes = 1024 (org.apache.zookeeper.server.ZooKeeperServer) [main]
2023-05-09 10:37:16,893 INFO Using org.apache.zookeeper.server.watch.WatchManager as watch manager (org.apache.zookeeper.server.watch.WatchManagerFactory) [main]
2023-05-09 10:37:16,893 INFO Using org.apache.zookeeper.server.watch.WatchManager as watch manager (org.apache.zookeeper.server.watch.WatchManagerFactory) [main]
2023-05-09 10:37:16,893 INFO zookeeper.snapshotSizeFactor = 0.33 (org.apache.zookeeper.server.ZKDatabase) [main]
2023-05-09 10:37:16,893 INFO zookeeper.commitLogCount=500 (org.apache.zookeeper.server.ZKDatabase) [main]
2023-05-09 10:37:16,896 INFO Using TLS encrypted quorum communication (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,896 INFO Port unification disabled (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,896 INFO multiAddress.enabled set to false (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,896 INFO multiAddress.reachabilityCheckEnabled set to true (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,896 INFO multiAddress.reachabilityCheckTimeoutMs set to 1000 (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,896 INFO QuorumPeer communication is not secured! (SASL auth disabled) (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,896 INFO quorum.cnxn.threads.size set to 20 (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,897 INFO Reading snapshot /var/lib/zookeeper/data/version-2/snapshot.0 (org.apache.zookeeper.server.persistence.FileSnap) [main]
2023-05-09 10:37:16,898 INFO The digest value is empty in snapshot (org.apache.zookeeper.server.DataTree) [main]
2023-05-09 10:37:16,899 INFO Snapshot loaded in 3 ms, highest zxid is 0x0, digest is 1371985504 (org.apache.zookeeper.server.ZKDatabase) [main]
2023-05-09 10:37:16,900 INFO binding to port /127.0.0.1:12181 (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
2023-05-09 10:37:16,928 INFO bound to port 12181 (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
2023-05-09 10:37:16,928 INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
2023-05-09 10:37:16,929 INFO bound to port 2181 (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
2023-05-09 10:37:16,929 INFO Using 4000ms as the quorum cnxn socket timeout (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
2023-05-09 10:37:16,932 INFO Election port bind maximum retries is infinite (org.apache.zookeeper.server.quorum.QuorumCnxManager) [main]
2023-05-09 10:37:16,933 INFO zookeeper.fastleader.minNotificationInterval=200 (org.apache.zookeeper.server.quorum.FastLeaderElection) [main]
2023-05-09 10:37:16,933 INFO zookeeper.fastleader.maxNotificationInterval=60000 (org.apache.zookeeper.server.quorum.FastLeaderElection) [main]
2023-05-09 10:37:16,934 INFO Creating TLS-only quorum server socket (org.apache.zookeeper.server.quorum.QuorumCnxManager) [ListenerHandler-lab-strimzi-zookeeper-1.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.42:3888]
2023-05-09 10:37:16,935 INFO 2 is accepting connections now, my election bind port: lab-strimzi-zookeeper-1.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.42:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [ListenerHandler-lab-strimzi-zookeeper-1.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.42:3888]
2023-05-09 10:37:16,938 INFO ZooKeeper audit is disabled. (org.apache.zookeeper.audit.ZKAuditProvider) [main]
2023-05-09 10:37:16,945 INFO LOOKING (org.apache.zookeeper.server.quorum.QuorumPeer) [QuorumPeer[myid=2](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2023-05-09 10:37:16,946 INFO New election. My id = 2, proposed zxid=0x0 (org.apache.zookeeper.server.quorum.FastLeaderElection) [QuorumPeer[myid=2](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2023-05-09 10:37:16,951 INFO Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:2, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 (org.apache.zookeeper.server.quorum.FastLeaderElection) [WorkerReceiver[myid=2]]
2023-05-09 10:37:17,121 WARN Cannot open channel to 1 at election address lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.38:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumConnectionThread-[myid=2]-1]
java.net.SocketException: Broken pipe
    at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
    at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
    at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
    at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035)
    at java.base/sun.security.ssl.SSLSocketOutputRecord.flush(SSLSocketOutputRecord.java:271)
    at java.base/sun.security.ssl.HandshakeOutStream.flush(HandshakeOutStream.java:89)
    at java.base/sun.security.ssl.CertificateVerify$T12CertificateVerifyProducer.produce(CertificateVerify.java:773)
    at java.base/sun.security.ssl.SSLHandshake.produce(SSLHandshake.java:440)
    at java.base/sun.security.ssl.ServerHelloDone$ServerHelloDoneConsumer.consume(ServerHelloDone.java:182)
    at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:458)
    at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:201)
    at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)
    at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1510)
    at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1425)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:386)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
2023-05-09 10:37:17,152 INFO Notification time out: 400 (org.apache.zookeeper.server.quorum.FastLeaderElection) [QuorumPeer[myid=2](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2023-05-09 10:37:17,190 WARN Cannot open channel to 1 at election address lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.38:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumConnectionThread-[myid=2]-2]
javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:358)
    at java.base/sun.security.ssl.Alert$AlertConsumer.consume(Alert.java:293)
    at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:204)
    at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)
    at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1510)
    at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1425)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:386)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
2023-05-09 10:37:17,553 INFO Notification time out: 800 (org.apache.zookeeper.server.quorum.FastLeaderElection) [QuorumPeer[myid=2](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2023-05-09 10:37:17,576 WARN Cannot open channel to 1 at election address lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.38:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumConnectionThread-[myid=2]-3]
java.net.SocketException: Broken pipe
    at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
    at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
    at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
    at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035)
    at java.base/sun.security.ssl.SSLSocketOutputRecord.encodeChangeCipherSpec(SSLSocketOutputRecord.java:231)
    at java.base/sun.security.ssl.OutputRecord.changeWriteCiphers(OutputRecord.java:187)
    at java.base/sun.security.ssl.ChangeCipherSpec$T10ChangeCipherSpecProducer.produce(ChangeCipherSpec.java:118)
    at java.base/sun.security.ssl.Finished$T12FinishedProducer.onProduceFinished(Finished.java:393)
    at java.base/sun.security.ssl.Finished$T12FinishedProducer.produce(Finished.java:377)
    at java.base/sun.security.ssl.SSLHandshake.produce(SSLHandshake.java:440)
    at java.base/sun.security.ssl.ServerHelloDone$ServerHelloDoneConsumer.consume(ServerHelloDone.java:182)
    at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:458)
    at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:201)
    at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)
    at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1510)
    at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1425)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:386)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
2023-05-09 10:37:18,354 INFO Notification time out: 1600 (org.apache.zookeeper.server.quorum.FastLeaderElection) [QuorumPeer[myid=2](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2023-05-09 10:37:18,366 WARN Cannot open channel to 1 at election address lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.38:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumConnectionThread-[myid=2]-3]
java.net.SocketException: Broken pipe
    at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
    at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
    at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
    at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035)
    at java.base/sun.security.ssl.SSLSocketOutputRecord.encodeChangeCipherSpec(SSLSocketOutputRecord.java:231)
    at java.base/sun.security.ssl.OutputRecord.changeWriteCiphers(OutputRecord.java:187)
    at java.base/sun.security.ssl.ChangeCipherSpec$T10ChangeCipherSpecProducer.produce(ChangeCipherSpec.java:118)
    at java.base/sun.security.ssl.Finished$T12FinishedProducer.onProduceFinished(Finished.java:393)
    at java.base/sun.security.ssl.Finished$T12FinishedProducer.produce(Finished.java:377)
    at java.base/sun.security.ssl.SSLHandshake.produce(SSLHandshake.java:440)
    at java.base/sun.security.ssl.ServerHelloDone$ServerHelloDoneConsumer.consume(ServerHelloDone.java:182)
    at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:458)
    at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:201)
    at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)
    at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1510)
    at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1425)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:386)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
2023-05-09 10:37:19,955 INFO Notification time out: 3200 (org.apache.zookeeper.server.quorum.FastLeaderElection) [QuorumPeer[myid=2](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2023-05-09 10:37:19,977 WARN Cannot open channel to 1 at election address lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.38:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumConnectionThread-[myid=2]-3]
java.net.SocketException: Broken pipe
    at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
    at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
    at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
    at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035)
    at java.base/sun.security.ssl.SSLSocketOutputRecord.encodeChangeCipherSpec(SSLSocketOutputRecord.java:231)
    at java.base/sun.security.ssl.OutputRecord.changeWriteCiphers(OutputRecord.java:187)
    at java.base/sun.security.ssl.ChangeCipherSpec$T10ChangeCipherSpecProducer.produce(ChangeCipherSpec.java:118)
    at java.base/sun.security.ssl.Finished$T12FinishedProducer.onProduceFinished(Finished.java:393)
    at java.base/sun.security.ssl.Finished$T12FinishedProducer.produce(Finished.java:377)
    at java.base/sun.security.ssl.SSLHandshake.produce(SSLHandshake.java:440)
    at java.base/sun.security.ssl.ServerHelloDone$ServerHelloDoneConsumer.consume(ServerHelloDone.java:182)
    at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:458)
    at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:201)
    at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)
    at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1510)
    at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1425)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:386)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
2023-05-09 10:37:23,159 INFO Notification time out: 6400 (org.apache.zookeeper.server.quorum.FastLeaderElection) [QuorumPeer[myid=2](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2023-05-09 10:37:23,191 WARN Cannot open channel to 1 at election address lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.38:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumConnectionThread-[myid=2]-3]
java.net.SocketException: Broken pipe
    at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
    at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
    at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
    at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035)
    at java.base/sun.security.ssl.SSLSocketOutputRecord.encodeChangeCipherSpec(SSLSocketOutputRecord.java:231)
    at java.base/sun.security.ssl.OutputRecord.changeWriteCiphers(OutputRecord.java:187)
    at java.base/sun.security.ssl.ChangeCipherSpec$T10ChangeCipherSpecProducer.produce(ChangeCipherSpec.java:118)
    at java.base/sun.security.ssl.Finished$T12FinishedProducer.onProduceFinished(Finished.java:393)
    at java.base/sun.security.ssl.Finished$T12FinishedProducer.produce(Finished.java:377)
    at java.base/sun.security.ssl.SSLHandshake.produce(SSLHandshake.java:440)
    at java.base/sun.security.ssl.ServerHelloDone$ServerHelloDoneConsumer.consume(ServerHelloDone.java:182)
    at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:458)
    at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:201)
    at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)
    at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1510)
    at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1425)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:386)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
2023-05-09 10:37:25,559 INFO Received connection request from /10.42.0.38:40756 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [ListenerHandler-lab-strimzi-zookeeper-1.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.42:3888]
2023-05-09 10:37:25,647 INFO Accepted TLS connection from lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc.cluster.local/10.42.0.38:40756 - TLSv1.2 - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (org.apache.zookeeper.server.quorum.UnifiedServerSocket) [ListenerHandler-lab-strimzi-zookeeper-1.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.42:3888]
2023-05-09 10:37:25,662 INFO Received connection request from /10.42.0.38:40766 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [ListenerHandler-lab-strimzi-zookeeper-1.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.42:3888]
2023-05-09 10:37:25,670 INFO Accepted TLS connection from /10.42.0.38:40766 - TLSv1.2 - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (org.apache.zookeeper.server.quorum.UnifiedServerSocket) [ListenerHandler-lab-strimzi-zookeeper-1.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.42:3888]
2023-05-09 10:37:25,693 WARN Cannot open channel to 1 at election address lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc.cluster.local/10.42.0.38:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumConnectionThread-[myid=2]-3]
javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:358)
    at java.base/sun.security.ssl.Alert$AlertConsumer.consume(Alert.java:293)
    at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:204)
    at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)
    at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1510)
    at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1425)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:386)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
lpalashevski commented 1 year ago

We can continue investigating but for the next release I am proposing to set the defaults back to 1 replica and allow users to specify more replicas for more advanced deployments, for local lab learning environment this is too much and unnecessary.

cc: @planetf1

planetf1 commented 1 year ago
planetf1 commented 1 year ago

for ref, zookeeper manages to get the tls connections setup just fine

2023-05-09 14:14:37,357 INFO LOOKING (org.apache.zookeeper.server.quorum.QuorumPeer) [QuorumPeer[myid=1](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2023-05-09 14:14:37,357 INFO New election. My id = 1, proposed zxid=0x0 (org.apache.zookeeper.server.quorum.FastLeaderElection) [QuorumPeer[myid=1](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2023-05-09 14:14:37,364 INFO Creating TLS-only quorum server socket (org.apache.zookeeper.server.quorum.QuorumCnxManager) [ListenerHandler-lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.251:3888]
2023-05-09 14:14:37,367 INFO 1 is accepting connections now, my election bind port: lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.251:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [ListenerHandler-lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.251:3888]
2023-05-09 14:14:37,397 INFO Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:1, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 (org.apache.zookeeper.server.quorum.FastLeaderElection) [WorkerReceiver[myid=1]]
2023-05-09 14:14:37,598 INFO Notification time out: 400 (org.apache.zookeeper.server.quorum.FastLeaderElection) [QuorumPeer[myid=1](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2023-05-09 14:14:37,841 INFO Received connection request from /10.42.0.252:52558 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [ListenerHandler-lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.251:3888]
2023-05-09 14:14:37,998 INFO Notification time out: 800 (org.apache.zookeeper.server.quorum.FastLeaderElection) [QuorumPeer[myid=1](plain=127.0.0.1:12181)(secure=[0:0:0:0:0:0:0:0]:2181)]
2023-05-09 14:14:38,235 INFO Accepted TLS connection from lab-strimzi-zookeeper-1.lab-strimzi-zookeeper-nodes.default.svc.cluster.local/10.42.0.252:52558 - TLSv1.2 - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (org.apache.zookeeper.server.quorum.UnifiedServerSocket) [ListenerHandler-lab-strimzi-zookeeper-0.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.251:3888]
2023-05-09 14:14:38,256 INFO Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:2, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 (org.apache.zookeeper.server.quorum.FastLeaderElection) [WorkerReceiver[myid=1]]
2023-05-09 14:14:38,257 INFO Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:2, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 (org.apache.zookeeper.server.quorum.FastLeaderElection) [WorkerReceiver[myid=1]]
2023-05-09 14:14:38,280 INFO SSL handshake complete with lab-strimzi-zookeeper-1.lab-strimzi-zookeeper-nodes.default.svc/10.42.0.252:3888 - TLSv1.2 - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumConnectionThread-[myid=1]-1]
2023-05-09 14:14:38,281 INFO Have smaller server identifier, so dropping the connection: (myId:1 --> sid:2) (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumConnectionThread-[myid=1]-1]
2023-05-09 14:14:38,292 INFO Received connection request from /10.42.0.252:52568 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [ListenerHandler-lab-strimzi-zookeeper-0.lab-strimzi-zook
 ➜  egeria git:(main) kubectl get pods
NAME                                              READY   STATUS    RESTARTS   AGE
lab-odpi-egeria-lab-uistatic-7755bf8677-7z4qj     1/1     Running   0          9m5s
lab-odpi-egeria-lab-ui-5f445ccd9-2sbgw            1/1     Running   0          9m5s
lab-odpi-egeria-lab-dev-0                         1/1     Running   0          9m5s
lab-odpi-egeria-lab-factory-0                     1/1     Running   0          9m5s
lab-odpi-egeria-lab-core-0                        1/1     Running   0          9m5s
lab-odpi-egeria-lab-datalake-0                    1/1     Running   0          9m4s
strimzi-cluster-operator-749fddd68-4gb7x          1/1     Running   0          9m5s
lab-strimzi-zookeeper-0                           1/1     Running   0          8m54s
lab-strimzi-zookeeper-1                           1/1     Running   0          8m54s
lab-odpi-egeria-lab-presentation-ddc6656f-m6vx8   1/1     Running   0          9m5s
lab-odpi-egeria-lab-jupyter-847d94c4bc-fc62k      1/1     Running   0          9m5s
lab-strimzi-kafka-1                               1/1     Running   0          8m26s
lab-strimzi-kafka-2                               1/1     Running   0          8m26s
lab-strimzi-kafka-0                               1/1     Running   0          8m26s
lab-strimzi-entity-operator-84b4cd4567-49t4k      3/3     Running   0          7m58s
planetf1 commented 1 year ago

It takes approx 13s from start to fully stable with a couple of retries later on in the sequence, which I believe is normal as the zookeeper nodes come up independently

planetf1 commented 1 year ago

So in summary, I do agree with change, and it could be borderline limits, but there's also going to likely be other scenarios causing issues - be it in how rancher behaves, or our charts. If the platform itself is starved of resources it may never get stable. Or there could be implementation issues with the processes under macOS. Or after k8s healthchecks continually fail, pod restarts may have returned stability. Helping the user debug/understand/know what is happening, and try to preempt through reducing footprint, improving docs etc is something we can always watch out for!

lpalashevski commented 1 year ago

Did additional test after your comments - I had 12G/4c originally, I just added +2c to match what you reported but I am getting same results unfortunately. I do not think this has to do with cluster resources at all (at least not from the pod logs) it looks like it requires extra attention with the ssl configuration when you scale zookeeper under some conditions (still unclean because it works on your stack)

We can confirm the same is reproduced at least on one more local development environment within our team.

On the more general point, I agree things can go wrong on may levels and different k8s environments - this is reason more the configuration for the labs should as minimal and simple as possible and its main purpose should be to demonstrate features/functionality.

planetf1 commented 1 year ago

@dwolfson did you see any issues with the multiple zookeepers ?

dwolfson commented 1 year ago

I have not had that particular issue today - but will look again. I have been having issues with the Egeria pods not coming up during a helm install - going into a crashloop-back-off loop and then finally dying. In the past, when I've seen this, I've found that rancher may have updated k8s - and backing off a release can help. Am about to try that..

Today, I was having a Kafka issue during ingestion tests - consistently about 20 min into the test..

I also agree that changing the replicas default back to 1 is a good idea.

Ok - bit of an update - looking in the Steve.log looks like there are a bunch of certificate errors - I've found going back a level of k8s sometimes clears this up. Nope - I was pointing to an old flavor of the V4 xtdb jar file...that fixed it.

planetf1 commented 1 year ago

Thanks @dwolfson. I'll add version clarification of what worked for me. If you have an issue with ingestion (ie after cluster has come up) might be worth tracking in a new issue [ I think we're all agreed going to 1 replica is appropriate for default though ]

I tend to update to latest before each test (and only back off if I hit issues). So my last test was on

lpalashevski commented 1 year ago

For the record, after additional tests performed we can conclude that the zookeeper certificates error is only happening for Kubernetes v1.25.0 most probably due to internal bug.

I can confirm that strimzi configuration running multiple replicas of zookeeper works on the current stable version of Kubernetes v1.26.4 and Rancher 1.8.1

image

Closing this issue.

lpalashevski commented 1 year ago

@planetf1 thanks for the help and managing to reproduce on your end.

planetf1 commented 1 year ago

1.25.0, 1.25.2, 1.25.3 all fail 1.25.4, 1.25.6, 1.26.4, 1.27.1 works 1.24.12 - zookeeper works, but the static UI fails!

not immediately obvious from https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1254 but it does look like an issue with k8s itself, or the platform.

Agree testing expicitly with the latest rancher desktop install, selecting 'stable' version is a good minimum requirement (anything extra in checking latest version is value on top) & that we need to focus on keeping simple things simple, and a sensible default config with any limitations clearly explained.