rabbitmq / osiris

Log based streaming subsystem for RabbitMQ
Other
45 stars 10 forks source link

Support "competing" readers #2

Open kjnilsson opened 4 years ago

kjnilsson commented 4 years ago

Currently all readers read the entire log and there is no mechanism for "competing" reads, i.e. where multiple readers read entries/chunks in round-robin order, i.e. in order to increase the speed at which entries for a given stream are processed.

A possible design for doing competing reads:

Downsides:

Vanlightly commented 4 years ago

The above looks like an elegant design.

This leaves open the question of consumer offset tracking.

Without competing consumers, manually managed consumer offsets are easy. Just periodically write the last offset to some kind of persistent store. With competing consumers, this becomes a more tricky problem to get right as it is not a single offset that represents a high watermark of where consumption has reached.

When we start offering built-in offset tracking, again it becomes more complex.

kjnilsson commented 4 years ago

I don't think a competing consumer can do offset tracking, the offset tracking is done by the read coordinator so new consumers just join the round-robin queue and are advised of the next chunk id to read

gerhard commented 4 years ago

It all sounds reasonable 👍 from me.

lukebakken commented 4 years ago

👍 to "collaborate"

acogoluegnes commented 4 years ago
  • Readers will ack back when they are finished processing a chunk id and so that they can be allocated another chunk id (some degree of pipelining should be be allowed).

How will it translate in term of API for reading client (e.g. the stream plugin)? Right now they send_file or register_offset_listener to get notified there's something new. I would expect this would be transparent for them as the reader/CRC would control which chunk they are supposed to send.

acogoluegnes commented 4 years ago
  • The CRC persists the current read state in the log as a special entry type so that it can be replicated recovered from anywhere

OK, so competing readers get offset tracking for free? This is not supported yet for traditional readers, but when it will, they will have to issue a command (commit?) and will expect the broker to keep the offset where they left off. Are we on the same page for this?

acogoluegnes commented 4 years ago

What about replay semantics? Can a group of competing consumers can start over? This implies some kind of deletion concept for the group, or at least offset reset for this group.

kjnilsson commented 4 years ago
  • Readers will ack back when they are finished processing a chunk id and so that they can be allocated another chunk id (some degree of pipelining should be be allowed).

How will it translate in term of API for reading client (e.g. the stream plugin)? Right now they send_file or register_offset_listener to get notified there's something new. I would expect this would be transparent for them as the reader/CRC would control which chunk they are supposed to send.

Yes we should hide this complexity (choose which process to send to) behind a common api.

kjnilsson commented 4 years ago

OK, so competing readers get offset tracking for free? This is not supported yet for traditional readers, but when it will, they will have to issue a command (commit?) and will expect the broker to keep the offset where they left off. Are we on the same page for this?

Yes but I could imagine considering using the same process for keeping consumer offsets for this stream. Might need to ponder that a bit. :)

kjnilsson commented 4 years ago

What about replay semantics? Can a group of competing consumers can start over? This implies some kind of deletion concept for the group, or at least offset reset for this group.

No I don't think replay would be supported for a competing reader group, you can of course replay if you like as a "normal" reader, i.e. not competing.