kaltura / nginx-kafka-log-module

Send Kafka messages from Nginx
BSD 2-Clause "Simplified" License
64 stars 22 forks source link

Auto reconnect issue #12

Open moweonlee opened 2 years ago

moweonlee commented 2 years ago

We have passed several tests with this module under some pretty heavy load without any problem.

By the way, I found a connection issue with Kafka server after some long time of idle connection ( maybe 3~4 days without any message produced ).

Here is the error message from module.

[error] 86#86: *20873698 failed to produce to topic log-topic partition -1: Local: Fatal error while logging request, client: 10.96.77.101, server: ~^(nlog\.)?(?<domain>.+)$, request: "POST /n HTTP/1.1", host: "my.server.com"

So I'd like to know whether if we need to implement some re-connection code here. https://github.com/kaltura/nginx-kafka-log-module/blob/master/ngx_http_kafka_log_module.c#L327

But as far as I know, lib-rdkafka module have a reconnection support by itself.

Any supports would be appreciated.

Thank you in advance.

erankor commented 2 years ago

That is correct, librdkafka handles reconnections, there are params to control it (reconnect.backoff.ms / reconnect.backoff.max.ms). The TODO you mentioned got here by inheritance from the module on which this one was based on - https://github.com/fooinha/nginx-json-log/blob/e48f4d646a47e672c18787f7b6d632fb2d1f0738/src/ngx_http_json_log_module.c#L2905 I don't think any special handling is required, we're using this module for many years, and it just works.

moweonlee commented 1 year ago

@erankor

Yes you're right. This module works fine without any problems for long time.

But we have found dead lock case when we enabled idempotent with some modifications. (This is a part of codes we have modified to enable kafka idempotent and transaction.)

image

The problem we have found is when kafka client module falls in logical idempotent error ( seqeuence number mismatch ) which is "Fatal error"

I know this case is behind your works. But I'd like to inform you for futher progress in the future.

The code under here works as solution for this "Fatal error" because rdKafka module can not restart automatically in this case.

So we put some code to gradually restart nginx process just by exiting process.

image