hyperledger / besu

An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu
https://www.hyperledger.org/projects/besu
Apache License 2.0
1.49k stars 815 forks source link

Private network's performance decreases when Kakfa problem appears. #5595

Open jorgesanjose opened 1 year ago

jorgesanjose commented 1 year ago

Description

In a private Besu network with Kafka Log4J appender set in log config, when Kakfa servers crash or the nodes are not possibly to connect to them, the logs of the nodes starts to show so many WARN/INFO logs about kafka and performance of the networks starts to fall, block time increases up 35-40 seconds and some transactions are not sent.

Steps to Reproduce (Bug)

  1. Private Besu network running with Kafka appender for Log4J set in config.
  2. Kafka servers crash or not avaliable
  3. Network's performance start to decrease

Expected behavior: Although the logs show the Kafka's warning messages, the performance of the network should not be affected.

Actual behavior: Network's performance start to decrease.

Logs

Here starts the warning logs.

2023-06-13 11:25:00.126+02:00 | EthScheduler-Workers-66018 | INFO  | PersistBlockTask | Imported #15,293,893 / 0 tx / 0 om / 0 (0.0%) gas / (0x5145aec319d7c6f69a085e144f19931f39afd67867a1eabc0319f3052936fe6
2023-06-13 11:25:05.160+02:00 | EthScheduler-Workers-66019 | INFO  | PersistBlockTask | Imported #15,293,894 / 0 tx / 0 om / 0 (0.0%) gas / (0x899530055a74b3f2b4ac664cdb509d31dcbf24bb618e8a1ec953763e130e4cf
2023-06-13 11:25:10.164+02:00 | EthScheduler-Workers-66020 | INFO  | PersistBlockTask | Imported #15,293,895 / 0 tx / 0 om / 0 (0.0%) gas / (0xc0d12c17975246e40aca0c1dcd83a1cd884e5aaf383d2edd0448e2f8d5be3df
2023-06-13 11:25:15.161+02:00 | EthScheduler-Workers-66021 | INFO  | PersistBlockTask | Imported #15,293,896 / 0 tx / 0 om / 0 (0.0%) gas / (0x74b555e88808b4f9fd5d0587f3f9bc834b37552dc1bffe67d776d158ca09edd
2023-06-13 11:25:20.139+02:00 | EthScheduler-Workers-66022 | INFO  | PersistBlockTask | Imported #15,293,897 / 0 tx / 0 om / 0 (0.0%) gas / (0x36dc60d2783adf67e749d8b3da3822ec578666fefbc270902d32e4bd2f5590f
2023-06-13 11:25:25.133+02:00 | EthScheduler-Workers-66023 | INFO  | PersistBlockTask | Imported #15,293,898 / 0 tx / 0 om / 0 (0.0%) gas / (0xf213a57929e6949ed9355014a66d188162917d562cb096482a03c0884430642
2023-06-13 11:25:30.187+02:00 | EthScheduler-Workers-66024 | INFO  | PersistBlockTask | Imported #15,293,899 / 0 tx / 0 om / 0 (0.0%) gas / (0x513bb5725d93bef62ed612230728d076c8e24fb9d79926c89b36ab719c2f46d
2023-06-13 11:25:35.166+02:00 | EthScheduler-Workers-66025 | INFO  | PersistBlockTask | Imported #15,293,900 / 0 tx / 0 om / 0 (0.0%) gas / (0x0bee2f641493d16cc7659d7c246481d86228f6077b65dbc88dc483951dacca5
2023-06-13 11:25:40.161+02:00 | EthScheduler-Workers-66026 | INFO  | PersistBlockTask | Imported #15,293,901 / 0 tx / 0 om / 0 (0.0%) gas / (0x2ce9de04c4f01496b52580400df73db173cc9f0b93952ae245689fe1e74c427
2023-06-13 11:25:45.193+02:00 | EthScheduler-Workers-66027 | INFO  | PersistBlockTask | Imported #15,293,902 / 0 tx / 0 om / 0 (0.0%) gas / (0x5cca66c9722a43f6b255ae3dffb901af976d7812f8f8979043bc775eeeaec8f
2023-06-13 11:25:50.400+02:00 | EthScheduler-Workers-66028 | INFO  | PersistBlockTask | Imported #15,293,903 / 0 tx / 0 om / 0 (0.0%) gas / (0xb3d6e2815c4b7944b86752760e44a5285e6b46a68d88a16e3a005d9183656bc
2023-06-13 11:25:55.201+02:00 | EthScheduler-Workers-66029 | INFO  | PersistBlockTask | Imported #15,293,904 / 0 tx / 0 om / 0 (0.0%) gas / (0xb34b1dad5e776d63ef6fd43a1578e7d4dab62bfb6508f1d50f15a3b4c2cb08c
2023-06-13 11:26:00.117+02:00 | EthScheduler-Workers-66030 | INFO  | PersistBlockTask | Imported #15,293,905 / 0 tx / 0 om / 0 (0.0%) gas / (0x6c7853e469defcb377b875a71a52419a555a71474c43b28f7605f8105a3692e
2023-06-13 11:26:05.126+02:00 | EthScheduler-Workers-66031 | INFO  | PersistBlockTask | Imported #15,293,906 / 0 tx / 0 om / 0 (0.0%) gas / (0x48fcbb4b9b1aea0365214d1ac5b2b06e7953b9430e7b44a6bba41971a56f20f
2023-06-13 11:26:09.793+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:26:23.816+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 2 disconnected.
2023-06-13 11:26:33.422+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Disconnecting from node 4 due to socket connection setup timeout. The time
2023-06-13 11:26:40.239+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:26:40.241+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker
2023-06-13 11:26:40.395+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:26:40.397+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker
2023-06-13 11:26:40.682+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:26:40.683+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker
2023-06-13 11:26:41.176+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:26:41.177+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker
2023-06-13 11:26:41.958+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:26:41.959+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker
2023-06-13 11:26:43.053+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:26:43.055+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker
2023-06-13 11:26:43.708+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Disconnecting from node 5 due to socket connection setup timeout. The time
2023-06-13 11:26:43.857+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 1 disconnected.
2023-06-13 11:26:43.857+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 1 (/180.88.19.49:9096) could not be established. Broker
2023-06-13 11:26:44.013+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:26:44.013+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker
2023-06-13 11:26:44.161+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 2 disconnected.
2023-06-13 11:26:44.162+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 2 (/180.88.19.48:9096) could not be established. Broker
2023-06-13 11:26:44.314+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 4 disconnected.
2023-06-13 11:26:44.315+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 4 (/180.88.19.46:9096) could not be established. Broker

And block time starts to increase.

2023-06-13 11:27:09.603+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 1 disconnected.
2023-06-13 11:27:09.604+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 1 (/180.88.19.49:9096) could not be established. Broker may not be available.
2023-06-13 11:27:10.112+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 5 disconnected.
2023-06-13 11:27:10.117+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 5 (/180.88.19.45:9096) could not be established. Broker may not be available.
2023-06-13 11:26:40.174+02:00 | EthScheduler-Workers-66032 | INFO  | PersistBlockTask | Imported #15,293,907 / 0 tx / 0 om / 0 (0.0%) gas / (0xbad96f54c13c9d139f3a65ce3d90351ed6faef9887dac27942c62e7491c53a70) in 0.054s. Peers: 3 **
2023-06-13 11:27:10.460+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:27:10.461+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker may not be available.
2023-06-13 11:27:10.686+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 4 disconnected.
2023-06-13 11:27:10.687+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 4 (/180.88.19.46:9096) could not be established. Broker may not be available.
2023-06-13 11:27:10.894+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 2 disconnected.
...
...
...
...
2023-06-13 11:27:36.555+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker may not be available.
2023-06-13 11:27:37.799+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:27:37.800+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker may not be available.
2023-06-13 11:27:38.750+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:27:38.750+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker may not be available.
2023-06-13 11:27:39.937+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:27:39.939+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker may not be available.
2023-06-13 11:27:10.223+02:00 | EthScheduler-Workers-66032 | INFO  | PersistBlockTask | Imported #15,293,908 / 0 tx / 0 om / 0 (0.0%) gas / (0x4d42d9aac73eb77e03df28c08df44c62be5f61b00c8059e079d1da5ce65d52c7) in 0.023s. Peers: 3
2023-06-13 11:27:41.103+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:27:41.104+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker may not be available.

More logs

2023-06-13 11:59:15.399+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 4 disconnected.
2023-06-13 11:59:15.400+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 4 (/180.88.19.46:9096) could not be established. Broker may not be available.
2023-06-13 11:59:15.909+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:59:15.910+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker may not be available.
2023-06-13 11:59:17.018+02:00 | kafka-producer-network-thread | producer-1 | INFO  | NetworkClient | [Producer clientId=producer-1] Node 3 disconnected.
2023-06-13 11:59:17.018+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Connection to node 3 (/180.88.19.47:9096) could not be established. Broker may not be available.
2023-06-13 11:59:17.408+02:00 | EthScheduler-Workers-67139 | INFO  | PersistBlockTask | Imported #15,294,000 / 0 tx / 0 om / 0 (0.0%) gas / (0xe7461c7cc6987e4309317f695fa7719b01923892c8c7bd964dfcfd7a72ba6ebc) in 0.011s. Peers: 5
2023-06-13 11:59:18.997+02:00 | kafka-producer-network-thread | producer-1 | WARN  | NetworkClient | [Producer clientId=producer-1] Error while fetching metadata with correlation id 70007 : {okapi-logs=LEADER_NOT_AVAILABLE}
2023-06-13 11:59:18.997+02:00 | kafka-producer-network-thread | producer-1 | WARN  | Sender | [Producer clientId=producer-1] Got error produce response with correlation id 70008 on topic-partition okapi-logs-0, retrying (2147483646 attempts left). Error: UNKNOWN_TOPIC_OR_PARTITION
2023-06-13 11:59:18.998+02:00 | kafka-producer-network-thread | producer-1 | WARN  | Sender | [Producer clientId=producer-1] Received unknown topic or partition error in produce request on partition okapi-logs-0. The topic-partition may not exist or the user may not have Describe access to it
2023-06-13 11:59:18.998+02:00 | kafka-producer-network-thread | producer-1 | WARN  | Sender | [Producer clientId=producer-1] Got error produce response with correlation id 70009 on topic-partition okapi-logs-0, retrying (2147483646 attempts left). Error: UNKNOWN_TOPIC_OR_PARTITION
2023-06-13 11:59:18.998+02:00 | kafka-producer-network-thread | producer-1 | WARN  | Sender | [Producer clientId=producer-1] Received unknown topic or partition error in produce request on partition okapi-logs-0. The topic-partition may not exist or the user may not have Describe access to it
2023-06-13 11:59:18.998+02:00 | kafka-producer-network-thread | producer-1 | WARN  | Sender | [Producer clientId=producer-1] Got error produce response with correlation id 70010 on topic-partition okapi-logs-0, retrying (2147483646 attempts left). Error: UNKNOWN_TOPIC_OR_PARTITION
2023-06-13 11:59:18.999+02:00 | kafka-producer-network-thread | producer-1 | WARN  | Sender | [Producer clientId=producer-1] Received unknown topic or partition error in produce request on partition okapi-logs-0. The topic-partition may not exist or the user may not have Describe access to it

When i've commented the Kakfa appender of all the nodes, the performance went back to normal.

2023-06-13 12:23:59.047+02:00 | EthScheduler-Workers-67313 | INFO  | PersistBlockTask | Imported #15,294,187 / 0 tx / 0 om / 0 (0.0%) gas / (0x8ec09bf0dbc96aadb9351d5869fac4dd5b845c398d857f16ddb83263c88d39f8) in 0.009s. Peers: 2
2023-06-13 12:24:04.078+02:00 | EthScheduler-Workers-67314 | INFO  | PersistBlockTask | Imported #15,294,188 / 0 tx / 0 om / 0 (0.0%) gas / (0xa5845014fee124ab8ade707970797e50bdde235eaa843167b3ff5fd7da8eba15) in 0.007s. Peers: 2
2023-06-13 12:24:09.075+02:00 | EthScheduler-Workers-67315 | INFO  | PersistBlockTask | Imported #15,294,189 / 0 tx / 0 om / 0 (0.0%) gas / (0xfa2716d1bf7597f4a01feec245d74b362c28702ade1ee59bbc3497011348fddd) in 0.007s. Peers: 2
2023-06-13 12:24:14.058+02:00 | EthScheduler-Workers-67316 | INFO  | PersistBlockTask | Imported #15,294,190 / 0 tx / 0 om / 0 (0.0%) gas / (0x7deee0702d33a76aefdabbf2e50b6bb641ffcc2bae58c8c06045be9b090489a6) in 0.009s. Peers: 2
2023-06-13 12:24:19.065+02:00 | EthScheduler-Workers-67317 | INFO  | PersistBlockTask | Imported #15,294,191 / 0 tx / 0 om / 0 (0.0%) gas / (0xf2966cbeef782788c6e660fe432b0553da4107b19098cae0672730c9752cdcf2) in 0.007s. Peers: 2
2023-06-13 12:24:24.106+02:00 | EthScheduler-Workers-67318 | INFO  | PersistBlockTask | Imported #15,294,192 / 0 tx / 0 om / 0 (0.0%) gas / (0x2b028d3b91c557ca6e9121a66c828502af6edb693e1be9ac4ed0f67d7b5d67c2) in 0.006s. Peers: 2
2023-06-13 12:24:29.079+02:00 | EthScheduler-Workers-67319 | INFO  | PersistBlockTask | Imported #15,294,193 / 0 tx / 0 om / 0 (0.0%) gas / (0x9fbd9eecab593003649dbef0a43ce8938edb0cd6403f0709c5e3bdc077d0f24c) in 0.006s. Peers: 2
2023-06-13 12:24:34.044+02:00 | EthScheduler-Workers-67320 | INFO  | PersistBlockTask | Imported #15,294,194 / 0 tx / 0 om / 0 (0.0%) gas / (0x82af043ee14e67b6bfac9d0f267fda0d024749819cd4c092b2ef630251842c64) in 0.006s. Peers: 2
2023-06-13 12:24:39.075+02:00 | EthScheduler-Workers-67321 | INFO  | PersistBlockTask | Imported #15,294,195 / 0 tx / 0 om / 0 (0.0%) gas / (0x3bd40d8529a397de5d6e4c40489b2805d005dcb2e70dd2f414a31c221fe39cd4) in 0.009s. Peers: 2

Versions (Add all that apply)

I don't know if for a logging service problem, the normal network's performance have to fail like this. Any ideas?

Thanks in advance.

jframe commented 1 year ago

Thanks for the report, we are not prioritising this right now.