infinyon / fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
https://www.fluvio.io/
Apache License 2.0
3.88k stars 491 forks source link

Remove unnecessary offset fetch during start of Consumer #2454

Open sehz opened 2 years ago

sehz commented 2 years ago

When consumer starts, it performs FetchOffsetsRequest in order to map logical offset (beginning, end, etc) to physical offset. This can be performed in the SPU itself instead of client reduces round trip as make it more reliable.

https://github.com/infinyon/fluvio/blob/master/crates/fluvio/src/consumer.rs#L290

galibey commented 2 years ago

It turned out, that the fix is much more complicated than it was supposed to be. The absolute offset received by FetchOffsetsRequest is used by (among other things) records filtering on the client-side. This filtering is needed because the server always sends back data by batch granularity (due to zero-copy responses). If the requested starting offset is in the middle of the batch, the whole batch will be sent to the client and the client code will filter out records before it sends to the end user's code. This only matters for the first batch in the stream but can't be ignored or workarounded somehow.

We could also add a response for StreamFetchRequest (currently, we do not expect a response for this request) and put resolved started offset there but it requires many changes in different places. The current flow of stream creation is generic and not specific to StreamFetchRequest. The gain from the fix seems less than efforts for refactoring, in my opinion.