linkedin / kafka-monitor

Xinfra Monitor monitors the availability of Kafka clusters by producing synthetic workloads using end-to-end pipelines to obtain derived vital statistics - E2E latency, service produce/consume availability, offsets commit availability & latency, message loss rate and more.
https://engineering.linkedin.com/blog/2016/05/open-sourcing-kafka-monitor
Apache License 2.0
2.02k stars 443 forks source link

ERROR Exception occurred while getting the topicDescriptionKafkaFuture for topic #347

Open antobaldu opened 3 years ago

antobaldu commented 3 years ago

Hello, I created the topics in advance (it is not allowed to create topics on the fly), brokers and zookeepers are accessible only with SSL.
I have this configuration:

{
    "single-cluster-monitor": {
        "class.name": "com.linkedin.xinfra.monitor.apps.SingleClusterMonitor",
        "topic": "appl.xinfra-monitor.e1.test.v0",
        "group.id": "appl.xinfra-monitor.e1.test.v0.balduzzia", 
        "zookeeper.connect": "e1-kafkazookeeper-alsu001.pnet.ch:2182,e1-kafkazookeeper-alsu002.pnet.ch:2182,e1-kafkazookeeper-alsu003.pnet.ch:2182",
        "bootstrap.servers": "e1-kafka-pfnet-lab.pnet.ch:7001",
        "request.timeout.ms": 9000,
        "produce.record.delay.ms": 100,
        "topic-management.topicCreationEnabled": false,
        "topic-management.replicationFactor" : 1,
        "topic-management.partitionsToBrokersRatio" : 2.0,
        "topic-management.rebalance.interval.ms" : 600000,
        "topic-management.preferred.leader.election.check.interval.ms" : 300000,
        "topic-management.topicFactory.props": {
        },
        "topic-management.topic.props": {
            "retention.ms": "3600000"
        },
        "produce.producer.props": {
            "client.id": "kmf-client-id",
            "producer.security.protocol": "SSL",
            "producer.ssl.keystore.location": "/var/spool/keybox/kafkaauth-balduzzia-e1/node_keystore.jks",
            "producer.ssl.keystore.password": "xxxxxx",
            "producer.ssl.truststore.location": "/var/spool/keybox/pki-all/truststore.jks",
            "producer.ssl.truststore.password": "passphrase"
        },
        "consume.latency.sla.ms": "20000",
        "consume.consumer.props": {
            "consumer.security.protocol": "SSL",
            "consumer.ssl.keystore.location": "/var/spool/keybox/kafkaauth-balduzzia-e1/node_keystore.jks",
            "consumer.ssl.keystore.password": "xxxxxx",
            "consumer.ssl.truststore.location": "/var/spool/keybox/pki-all/truststore.jks",
            "consumer.ssl.truststore.password": "passphrase"
        }
    }
}

I receive this error, what can be the problem?

[2021-05-31 15:34:55,242] INFO single-cluster-monitor/ProduceService is initialized. (com.linkedin.xinfra.monitor.services.ProduceService)
[2021-05-31 15:34:55,264] INFO CommitLatencyMetrics was constructed successfully. (com.linkedin.xinfra.monitor.services.metrics.CommitLatencyMetrics)
[2021-05-31 15:34:55,265] INFO CommitAvailabilityMetrics called. (com.linkedin.xinfra.monitor.services.metrics.CommitAvailabilityMetrics)
[2021-05-31 15:34:55,301] INFO Topic management periodical procedure started with initial delay 0 ms and interval 600000 ms (com.linkedin.xinfra.monitor.services.MultiClusterTopicManagementService)
[2021-05-31 15:34:55,301] INFO Preferred leader election periodical procedure started with initial delay 300000 ms and interval 300000 ms (com.linkedin.xinfra.monitor.services.MultiClusterTopicManagementService)
[2021-05-31 15:34:55,633] INFO Topic creation is not enabled for appl.xinfra-monitor.e1.test.v0 in a cluster with Zookeeper URL e1-kafkazookeeper-alsu001.pnet.ch:2182,e1-kafkazookeeper-alsu002.pnet.ch:2182,e1-kafkazookeeper-alsu003.pnet.ch:2182. Refer to config: topic-management.topicCreationEnabled (com.linkedin.xinfra.monitor.services.MultiClusterTopicManagementService)
[2021-05-31 15:35:04,638] ERROR Topic-management-service-for-single-cluster-monitor/MultiClusterTopicManagementService will stop due to error. (com.linkedin.xinfra.monitor.services.MultiClusterTopicManagementService)
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
        at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) ~[kafka-clients-2.4.0.jar:?]
        at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) ~[kafka-clients-2.4.0.jar:?]
        at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) ~[kafka-clients-2.4.0.jar:?]
        at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) ~[kafka-clients-2.4.0.jar:?]
        at com.linkedin.xinfra.monitor.services.MultiClusterTopicManagementService$TopicManagementHelper.minPartitionNum(MultiClusterTopicManagementService.java:324) ~[kafka-monitor-2.5.11-SNAPSHOT.jar:2.5.11-SNAPSHOT]
        at com.linkedin.xinfra.monitor.services.MultiClusterTopicManagementService$TopicManagementRunnable.run(MultiClusterTopicManagementService.java:191) [kafka-monitor-2.5.11-SNAPSHOT.jar:2.5.11-SNAPSHOT]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
[2021-05-31 15:35:04,651] INFO Topic-management-service-for-single-cluster-monitor/MultiClusterTopicManagementService stopped. (com.linkedin.xinfra.monitor.services.MultiClusterTopicManagementService)

Regards Antonella

github-actions[bot] commented 3 years ago

This is your first issue in the repository. Thank you for raising this issue.' first issue

Darkest commented 3 years ago

You should have put SSL configs to "single-cluster-monitor" sections as well as to "produce.producer.props" and "consume.consumer.props". I have had similar probmlem, and it helped. Also just to mention repo looks rather abandoned and poorly documented? Is the project going to be maintained and developed ?

christofluethi commented 3 years ago

You should have put SSL configs to "single-cluster-monitor" sections as well as to "produce.producer.props" and "consume.consumer.props". I have had similar probmlem, and it helped. Also just to mention repo looks rather abandoned and poorly documented? Is the project going to be maintained and developed ?

yes, that actually solved the issue. thanks.

antobaldu commented 3 years ago

This is the configuration, in our case we have already a topic ${TOPIC} .

"single-cluster-monitor": {
        "class.name": "com.linkedin.xinfra.monitor.apps.SingleClusterMonitor",
        "topic": "${TOPIC}",
        "zookeeper.connect": "${ZOOKEEPER_CONNECT}",
        "bootstrap.servers": "${BOOTSTRAP_SERVERS}",
        "request.timeout.ms": 3000,
        "produce.record.delay.ms": 1000,
        "topic-management.topicManagementEnabled": false,
        "topic-management.topicCreationEnabled": false,
        "topic-management.topicAddPartitionEnabled": false,
        "topic-management.topicReassignPartitionAndElectLeaderEnabled": false,
        "client.id": "xinfra-monitor-adminclient",
        "security.protocol": "SSL",
        "ssl.keystore.location": "/etc/secrets/node_keystore.p12",
        "ssl.keystore.password": "${KAFKA_KEYSTORE_PASSWORD}",
        "ssl.keystore.type": "PKCS12",
        "ssl.truststore.location": "/etc/secrets/truststore.p12",
        "ssl.truststore.password": "${KAFKA_TRUSTSTORE_PASSWORD}",
        "ssl.truststore.type": "PKCS12",
        "produce.producer.props": {
          "client.id": "xinfra-monitor-producer",
          "security.protocol": "SSL",
          "ssl.keystore.location": "/etc/secrets/node_keystore.p12",
          "ssl.keystore.password": "${KAFKA_KEYSTORE_PASSWORD}",
          "ssl.keystore.type": "PKCS12",
          "ssl.truststore.location": "/etc/secrets/truststore.p12",
          "ssl.truststore.password": "${KAFKA_TRUSTSTORE_PASSWORD}",
          "ssl.truststore.type": "PKCS12"
        },
        "consume.latency.sla.ms": "${CONSUME_LATENCY_SLA}",
        "consume.consumer.props": {
          "client.id": "xinfra-monitor-consumer",
          "group.id": "${GROUP_ID}",
          "security.protocol": "SSL",
          "ssl.keystore.location": "/etc/secrets/node_keystore.p12",
          "ssl.keystore.password": "${KAFKA_KEYSTORE_PASSWORD}",
          "ssl.keystore.type": "PKCS12",
          "ssl.truststore.location": "/etc/secrets/truststore.p12",
          "ssl.truststore.password": "${KAFKA_TRUSTSTORE_PASSWORD}",
          "ssl.truststore.type": "PKCS12"
        }
      },