rabbitmq / osiris

Log based streaming subsystem for RabbitMQ
Other
45 stars 10 forks source link

Chunk Iterator API in osiris_log #150

Closed kjnilsson closed 1 year ago

kjnilsson commented 1 year ago

Is your feature request related to a problem? Please describe.

Some uses of osiris_log offset readers may not want to read the entire chunk in one go as may not be able to deliver all messages to the consumer at once (e.g. consuming AMQP 0.9.1 clients whose consumer prefetch is reached or AMQP 1.0 clients who ran out of link credits). To avoid having to keep data in memory we could implement an iterator API that returns a bit of state that you then iterate over and it will read each entry in the chunk on demand.

Describe the solution you'd like

New functions in osiris_log

This finds the next readable chunk and returns an opaque iterator state. -spec next_chunk_iterator(state()) -> {ok, chunk_iterator()} | {end_of_stream, state()} | {error, term()}

This gets the next entry or end of chunk. -spec iterator_next(chunk_iterator()) -> end_of_chunk | {chunk_iterator() | Entry :: {offset(), binary()}}

This is optional and can return information about the iterator such as number of entries left etc. -spec iterator_info(chunk_iterator()) -> map().

The iterator should handle uncompressed sub batches.

Describe alternatives you've considered

No response

Additional context

Because the entire chunk payload is never present in memory at the same time we are not able to perform the CRC check on the entries in the chunk.

This may result in an increase of pread sys calls so we may want to try to implement some kind of readahead inside the chunk iterator state.

kjnilsson commented 1 year ago

Closed in #151