scrapinghub / frontera

A scalable frontier for web crawlers
BSD 3-Clause "New" or "Revised" License
1.29k stars 216 forks source link

Increase session.timeout.ms for KafkaConsumer #358

Open jpbalarini opened 5 years ago

jpbalarini commented 5 years ago

We found an issue where we were seeing duplicate messages on the bus and some warnings sent by Kafka.

The issue was that the consumers were disconnected and the messages were sent again. The heartbeat_interval_ms on the KafkaConsumer is tightly related to the session_timeout_ms and cannot be set randomly. If no heartbeat is sent before the session timeout occurs, the consumer is considered dead (and currently both are set to the same value, which will cause errors if the timeout occurs before the heartbeat). As the documentation says https://kafka.apache.org/documentation/ on heartbeat.interval.ms the heartbeat_interval_ms has to be less than 1/3 of the session_timeout_ms. This PR introduces changes to take that into account.

It can be seen on the kafka-python repo that the default values take this into consideration.

Thanks!

jpbalarini commented 5 years ago

@sibiryakov can we introduce this? Thanks!