Open svroonland opened 1 week ago
When I last looked into this my conclusion was that it is not possible get an unbounded number of items in these queues. The only moment these queues can grow is when the runloop is paused, that is, when no stream is pulling data, and no stream is committing offsets. However, under this condition no new commands and no new commits are added. In other words, the situation of unbounded growth is not possible (pending bugs of course).
pending bugs of course Making the queues bounded would exclude one more possible cause when troubleshooting issues that we would encounter in the future, like timeouts and memory leaks. In any case it's good practice to have bounded resource usage in systems and have some backpressure. I believe with ZIO's queues that bounded ones are more performant as well.
Determining what would be a good queue size is a second question.
For the command queue, which is cleared on every poll cycle, we'd need to accommodate:
CommitAvailable
is added. This number is potentially unbounded as well, while we only need at most one (addressed in #1390).Poll
, CommitAvailable
, AddSubscription
The commit queue should be large enough to accomodate at most max.poll.records
. That is, when the user is committing offsets one by one and awaiting them one by one, which is not very performant, they should be batching them anyway. Making the queue bounded would just introduce a bit of extra transparent backpressure on the commit
call, replacing the await
delay by some offer
delay.
Remember https://github.com/zio/zio-kafka/pull/986 ? What changed that we make the queues bounded again?
Oh wow, I did not remember that indeed. Then we need at least a good theory of what happend in 986 and what would be different this time.
One thing that is different is that we now a separate commit queue.
Another thing that is different is that we now have metrics, so you can keep an eye on the actual queue sizes.
The
commandQueue
andcommitQueue
are now unbounded, which could lead to unbounded memory usage in exceptional situations. Having bounded queues would be preferred, though we need to carefully examine the points where backpressure is created - when offering to the queue when it is full - and make sure that does not cause any issues.Since committing is a blocking operation already (in the conceptual sense, not a blocking thread), backpressure from the queue would not be an issue for the end user. Commits are added back to the queue on failure, which might be a problem when it is full.