Open pratanini opened 3 years ago
@jszwedko Did you find time to look into it? It's blocking us to go for production.
Hey @santoshghhegde !
Apologies for the delay. I took a look at this just now and I think I'm missing some context. In the debug output logs you shared, I'm not seeing any failures writing to Kafka. In the title, you mentioned that Vector isn't retrying; are you observing it fail to write somewhere? Or are you observing the kakfa
sink to fail processing altogether, silently?
Hi @jszwedko Yes, in these logs it's failing silently but I have also seen vector doesn't retry if network issues occur. That leads us to 2 issues I guess but I don't know if both are somehow related.
So I tried to dig into source code and found out that Vector does not handle kafka sink retries but instead let rdkafka internal mechanisms to work with it. There's message_timeout_ms
parameter in kafka sink which translates to rdkafka's message.timeout.ms
and defaults to 5 minutes.
From rdkafka docs:
Local message timeout. This value is only enforced locally and limits the time a produced message waits for successful delivery. A time of 0 is infinite. This is the maximum time librdkafka may use to deliver a message (including retries). Delivery error occurs when either the retry count or the message timeout are exceeded. The message timeout is automatically adjusted to transaction.timeout.ms if transactional.id is configured.
Retry count is by default set to highest value so this parameter is probably only one that is relevant to retries.
Vector Version
Vector Configuration File
Debug Output
Expected Behavior
Vector sends logs to Kafka
Actual Behavior
Vector stops sending logs to Kafka
Additional Context