thingsboard / thingsboard

Open-source IoT Platform - Device management, data collection, processing and visualization.
https://thingsboard.io
Apache License 2.0
16.13k stars 4.92k forks source link

kafka always error #10823

Open xiddjp opened 1 month ago

xiddjp commented 1 month ago

I use Kafka as my message queue, and after the service runs for a period of time, the tb_core service often encounters exceptions. I have adjusted the parameters of the following two kafka, but it did not work.

In kafka.yml: KAFKA_MESSAGE_MAX_BYTES: 33554432 In queue-kafka.env: TB_KAFKA_MAX_REQUEST_SIZE=33554432

However, the following errors still occur: 2024-05-16+13:25:23.043 [kafka-coordinator-heartbeat-thread | tb-core-node] ERROR org.apache.kafka.common.network.NetworkReceive - Allocating buffer of size 26211464 for source 2 2024-05-16+13:25:23.043 [kafka-coordinator-heartbeat-thread | tb-core-node] ERROR org.apache.kafka.common.network.NetworkReceive - Stack Trace: java.base/java.lang.Thread.getStackTrace(Thread.java:1602)|org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:125)|org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:452)|org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:402)|org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674)|org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:576)|org.apache.kafka.common.network.Selector.poll(Selector.java:481)|org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:560)|org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:265)|org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:306)|org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:1433)

Some error log from zookeeper, ip 192.168.32.1 looks like gateway ip of tb core and tb rule engine: 2024-05-16 14:22:43,194 [myid:1] - INFO [NIOWorkerThread-16:ZooKeeperServer@1047] - Refusing session request for client /192.168.32.1:52408 as it has seen zxid 0x189c0000179f our last zxid is 0xc00000634 client must try another server 2024-05-16 14:22:43,352 [myid:1] - INFO [NIOWorkerThread-29:ZooKeeperServer@1047] - Refusing session request for client /192.168.32.1:38874 as it has seen zxid 0x189d00000001 our last zxid is 0xc00000634 client must try another server 2024-05-16 14:22:43,502 [myid:1] - INFO [NIOWorkerThread-11:ZooKeeperServer@1047] - Refusing session request for client /192.168.32.1:38876 as it has seen zxid 0x189d00000001 our last zxid is 0xc00000634 client must try another server 2024-05-16 14:22:43,518 [myid:1] - INFO [NIOWorkerThread-32:ZooKeeperServer@1047] - Refusing session request for client /192.168.32.1:46678 as it has seen zxid 0x189d00000003 our last zxid is 0xc00000634 client must try another server 2024-05-16 14:22:43,791 [myid:1] - INFO [NIOWorkerThread-13:ZooKeeperServer@1047] - Refusing session request for client /192.168.32.1:51388 as it has seen zxid 0x189c0000179f our last zxid is 0xc00000634 client must try another server 2024-05-16 14:22:43,893 [myid:1] - INFO [NIOWorkerThread-3:ZooKeeperServer@1047] - Refusing session request for client /192.168.32.1:46682 as it has seen zxid 0x189d00000003 our last zxid is 0xc00000634 client must try another server

ViacheslavKlimov commented 1 month ago

These logs are only indicating that the size of the records returned from single poll, exceeds certain threshold. The level of the logs should be "WARN", as there is no error happening and the messages are processed anyway. However, consider decreasing TB_QUEUE_KAFKA_MAX_POLL_RECORDS, to avoid getting too big results from poll.

xiddjp commented 1 month ago

However, if such error log appears, the data update on the page will be delayed.

xiddjp commented 1 month ago

@ViacheslavKlimov Can i increase the value of TB_LOG_REQUESTED_BUFFER_SIZE in NetworkReceive.java ?

ViacheslavKlimov commented 1 month ago

This only increases the threshold for this log to appear. Better adjust TB_QUEUE_KAFKA_MAX_POLL_RECORDS, for example set to 500.