Metrics endpoint is continuously failing, despite nodes being high, but still having room in CPU and memory
~ » k top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-fianu-prod-fianu-node-pool-ef36cf4e-gjhf 2716m 69% 10440Mi 78%
gke-fianu-prod-fianu-node-pool-ef36cf4e-jlcx 2468m 62% 13899Mi 104%
gke-fianu-prod-fianu-node-pool-ef36cf4e-vtcr 3977m 45% 12308Mi 92%
gke-fianu-prod-fianu-node-pool-ef36cf4e-wsdw 2518m 64% 9113Mi 68%
Seeing lots of:
org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept
{"@timestamp":"2024-04-25T17:47:43.004Z","@version":"1","message":"Failed to produce record path=/dxcm/default","logger_name":"dev.knative.eventing.kafka.broker.receiver.impl.handler.IngressRequestHandlerImpl","thread_name":"vert.x-eventloop-thread-2","level":"WARN","level_value":30000,"stack_trace":"org.apache.kafka.common.errors.TimeoutException: Topic knative-broker-dxcm-default not present in metadata after 60000 ms.\n","path":"/dxcm/default"}
{"@timestamp":"2024-04-25T18:32:34.27Z","@version":"1","message":"Failed to send record topic=knative-broker-d-default {}","logger_name":"dev.knative.eventing.kafka.broker.receiver.impl.handler.IngressRequestHandlerImpl","thread_name":"vert.x-eventloop-thread-0","level":"ERROR","level_value":40000,"stack_trace":"org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for knative-broker-d-default-7:120000 ms has passed since batch creation\n","topic":"knative-broker-dxcm-default"}
{"@timestamp":"2024-04-25T18:32:34.27Z","@version":"1","message":"Failed to produce record path=/d/default","logger_name":"dev.knative.eventing.kafka.broker.receiver.impl.handler.IngressRequestHandlerImpl","thread_name":"vert.x-eventloop-thread-0","level":"WARN","level_value":30000,"stack_trace":"org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for knative-broker-d-default-7:120000 ms has passed since batch creation\n","path":"/d/default"}
{"@timestamp":"2024-04-25T18:32:37.025Z","@version":"1","message":"[Producer clientId=producer-1] Disconnecting from node 2 due to socket connection setup timeout. The timeout value is 31485 ms.","logger_name":"org.apache.kafka.clients.NetworkClient","thread_name":"kafka-producer-network-thread | producer-1","level":"INFO","level_value":20000}
Expected behavior
No error messages in the logs, max size would be respected of 20 MB.
To Reproduce
Upgraded from 1.2 to 1.3.8
Knative release version
1.3.8
Additional context
Maybe some of these logs were already here in the past, but providing everything I see to better understand where I need to diagnose.
Connecting to a dedicated Kafka instance running in Confluent Kafka. Could it be that the topics need to be externally managed?
Having a multitude of issues after upgrading to the latest version 1.3.8.
Describe the bug
Metrics endpoint is continuously failing, despite nodes being high, but still having room in CPU and memory
Seeing lots of:
Expected behavior
No error messages in the logs, max size would be respected of 20 MB.
To Reproduce
Upgraded from 1.2 to 1.3.8
Knative release version
1.3.8
Additional context
Maybe some of these logs were already here in the past, but providing everything I see to better understand where I need to diagnose.
Connecting to a dedicated Kafka instance running in Confluent Kafka. Could it be that the topics need to be externally managed?