zalando / nakadi

A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues
https://nakadi.io
MIT License
952 stars 295 forks source link

Keepalive commit #708

Open satybald opened 6 years ago

satybald commented 6 years ago

Context: Even thought Nakadi supports streaming of max_uncommitted_events, it still expects a client to commit an event within 60 seconds. Otherwise, a TCP connection gets closed.

Problem: Those consumers, who buffer events locally for a couple of hours before uploading to the downstream storage engine due to processing semantics, create at most once delivery guarantee. The likelihood of consumer, storage or network issues, during a buffer time, is quite high. Thus consumer can lose a whole buffer chunk while already have been committed those events to Nakadi.

Proposed change: As it necessary for Nakadi to keep track of consumer liveness(issue #594), and a data commit in ZK quite expensive, one of the possibility to introduce keepalive commit with fake/artificial offset(aka "BEGIN" or ZERO). Consumers will expect to ACK their healthiness via keep alive commit, however, do a real commit when downstream processing is finished.

Related issue:

phibrex commented 6 years ago

ARUHA-986 was added to our backlog.