What people really want is definitely "exactly once", where by duplicate messages are not delivered. But within the context of a distributed system, "You Cannot Have Exactly-Once Delivery". [There are two common reasons duplicate may occur)[https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer]: one is network error when sending messages, the other is consumption's process crashes.
At most once semantics is easy to implement, actually in this condition the cluster will hit ultra high throughput and low latency, but data loss is an unacceptable option in many biz environment.
So, at least once delivery semantics becomes realistically the only option. We try everything to ensure no data loss, at the cost of complicated design, even if duplication.
There are three message delivery semantics.
What people really want is definitely "exactly once", where by duplicate messages are not delivered. But within the context of a distributed system, "You Cannot Have Exactly-Once Delivery". [There are two common reasons duplicate may occur)[https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer]: one is network error when sending messages, the other is consumption's process crashes.
At most once semantics is easy to implement, actually in this condition the cluster will hit ultra high throughput and low latency, but data loss is an unacceptable option in many biz environment.
So, at least once delivery semantics becomes realistically the only option. We try everything to ensure no data loss, at the cost of complicated design, even if duplication.