Closed estin closed 3 years ago
It sounds like somewhere along the way idle TCP connections get closed. Can you add the following configuration to the configuration of your brod client:
{extra_sock_opts, [{keepalive, true}]}
Also it looks like you're using plaintext connection, do you see the same error when using ssl?
Used k3s orchestration
I am not even remotely an expert in kubernetes, but which ingress controller do you use for the kafka broker?
Please check this https://github.com/estin/verne_and_brod This is a full test case on docker environment (no k3s used). brod used in vernemq plugin
Kafka setting https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html#connections.max.idle.ms
And after payload connection down
when kafka closed connection brod can't reconnect.
Thanks!
I've been having similar problems.
The Kafka broker closes idle connections after 10 minutes. If you don't have any consumers and don't produce anything, the connection becomes idle and is closed. If you have consumers, they periodically send heartbeats so then it's not a problem (at least not from what I have seen). The Keep-alive time is often set much larger than 10 minutes by the OS so that won't work if you don't change that. On my machine it's set to cat /proc/sys/net/ipv4/tcp_keepalive_time
=> 7200. It might not work anyway because the Kafka broker might only look at requests, but I have not looked into that.
I also have the problem of running on Azure where they (load-balancers, event hubs) close connections after 4 minutes. So similar setups can have even lower allowed idle times for connections.
The java client has the options: metadata.max.age.ms and connections.max.idle.ms which you can use for keeping connections alive or at least close (and I assume re-open) them if they are idle too long.
I can't find any similar functionality in brod so I will handle it by wrapping brod and keep track of when a connection (brod_client) becomes idle, by keeping track of when I call produce. If it becomes idle, I will close and then start the client again. Hopefully this will fix my problems.
{reconnect_cool_down_seconds,<<"10">>},
this is the real issue, should be {reconnect_cool_down_seconds,10},
instead.
This https://github.com/klarna/brod/blob/master/src/brod_client.erl#L715 compare always return false when a number compares to binary.
the tcp_closed
error is the last error re-emitted over and over again.
@zmstone hmm... in repo where bug was reproduced this setting is ok https://github.com/estin/verne_and_brod/blob/master/src/vernemq_demo_plugin_sup.erl#L25
bord producer is designed NOT to reconnect immediately when idle. the reconnect is triggered when the next produce request comes in. are you sure there are produce requests in your reproduce procedure ?
@zmstone thanks!
I'm understand it now!
Now after {reconnect_cool_down_seconds,10}
socket alive again
misconfigred
Hi!
Client to produce messages "closed" after ~10 minutes of idle
When I try to produce message client infinitely fail with this message
After restarting whole app - all fine. Wait 10 minutes of idle and client can't reconnect to kafka.
brod version: 3.15.1 brod client configuration
brod used as part of plugin for vernemq (don't have enough experience at that fields - erl+brod+verne) Used k3s orchestration. Kafka is available at that time and can produce/consume messages.
Any ideas why each 10 minutes of idle client close connection and can't reconnect? May be needs some extra actions in supervisor or simple brod configuration?
Sorry for my poor English.