mailgun / kafka-pixy

gRPC/REST proxy for Kafka
Apache License 2.0
774 stars 118 forks source link

at-least-once .. the README offers apparently contradictory information on this topic? #136

Closed nmarasoiu closed 6 years ago

nmarasoiu commented 6 years ago

Hi, We like this library very much however we have one question regarding the at-least-once possibility and its performance implications, but first, reading the README, I remained with the impression that it is not certain if at-least-once is indeed available at this point, because:

  1. The "sync" flag is commented that it is not taken into account
  2. It is mentioned that the production is async in all situations, thereby some messages can always be lost. We agree that async is performant and efficient but the question comes back to at-least-once.

As a more high level view, we intend an exactly-once processing, stamping messages with an idempotence key including an uuid, sending them on an at-least-once pipe, (either REST with retries, gRPC, or kafka-client), and compensating processing in case of duplicate processing.

Please indicate if at-least-once is possible or will be, and please clarify the apparent inconsistencies identified on a first read of the README. Thank you, Please advise, Nicu Marasoiu MetroSystems Romania

horkhe commented 6 years ago
  1. The "sync" flag is commented that it is not taken into account

It is not commented out, why do you say that?

  1. It is mentioned that the production is async in all situations, thereby some messages can always be lost. We agree that async is performant and efficient but the question comes back to at-least-once.

Could you please point to the place where the doc says that.

By the way this not a library, but an application that is supposed to be running on the same host as your application. It can work in at-least-once mode if the following conditions are met:

  1. Kafka-Pixy is configured to wait for acks from all IRS before responding with success (required_acks: wait_for_all)
  2. Your application produces in sync mode (ProdRq.async_mode=False)
  3. Your application acks consumed messages explicitly
nmarasoiu commented 6 years ago

Ok thanks clarified

sibsssidor commented 3 years ago

@horkhe Apologies for jumping in an old discussion, I'm not clear what would be the behavior if multiple clients connect to the same Kafka pixy instance and ack explicitly. As far as I can see there is no explicit tracking of client IDs (that makes things easier) but if that is the case - would the second client receive the same set of messages until the first one ack's? If it gets new messages, can it ack the offset on top of the first one? (before the first one ack'd)? Tx in advance for the help!