nsqio / go-diskqueue

A Go package providing a filesystem-backed FIFO queue
MIT License
460 stars 103 forks source link

Peek multiple #40

Open baryluk opened 2 years ago

baryluk commented 2 years ago

I use diskqueue as a staging local queue before pushing to Kafka. I use sarama go kafka library for Kafka part.

I need to have high reliability of delivery, and can tolerate small amount of duplicates (they have extra metadata in data to deduplicate easily on a consumer side). To achieve high reliability and also absorb traffic spikes, this is what I do: In one go routine I queue to diskqueue, and in other I do blocking peek, and when there is something I send it to Kafka, once Kafka is happy, I do a read, removing it from a local diskqueue.

I could achieve this also in more complex fashion by using various synchronization primitives and switching between local diskqueue and kafka depending on a backpressure and in-memory queue size, but that would be really quite complex and error prone. So I just decopouled the two (at the cost of allways going to disk, which in the end is not bad, because this protect me from any program crash or exit).

The issue is peeking performance. Because I can only peek one element, I cannot do Kafka batching, which limits throughput, and makes compression less effective.

I would like to peek at say up to 100 next elements, send them in batch, then remove all sent elements (using readChan).

mreiferson commented 2 years ago

That's interesting, sounds like you'd actually want some API to "commit" the block of reads rather than having to pull them all off readChan, too.