zio / zio-kafka

A Kafka client for ZIO and ZIO Streams
https://zio.dev/zio-kafka
Apache License 2.0
342 stars 139 forks source link

Make queues bounded #1386

Open svroonland opened 1 week ago

svroonland commented 1 week ago

The commandQueue and commitQueue are now unbounded, which could lead to unbounded memory usage in exceptional situations. Having bounded queues would be preferred, though we need to carefully examine the points where backpressure is created - when offering to the queue when it is full - and make sure that does not cause any issues.

Since committing is a blocking operation already (in the conceptual sense, not a blocking thread), backpressure from the queue would not be an issue for the end user. Commits are added back to the queue on failure, which might be a problem when it is full.

erikvanoosten commented 1 week ago

When I last looked into this my conclusion was that it is not possible get an unbounded number of items in these queues. The only moment these queues can grow is when the runloop is paused, that is, when no stream is pulling data, and no stream is committing offsets. However, under this condition no new commands and no new commits are added. In other words, the situation of unbounded growth is not possible (pending bugs of course).

svroonland commented 1 week ago

pending bugs of course Making the queues bounded would exclude one more possible cause when troubleshooting issues that we would encounter in the future, like timeouts and memory leaks. In any case it's good practice to have bounded resource usage in systems and have some backpressure. I believe with ZIO's queues that bounded ones are more performant as well.

Determining what would be a good queue size is a second question.

For the command queue, which is cleared on every poll cycle, we'd need to accommodate:

The commit queue should be large enough to accomodate at most max.poll.records. That is, when the user is committing offsets one by one and awaiting them one by one, which is not very performant, they should be batching them anyway. Making the queue bounded would just introduce a bit of extra transparent backpressure on the commit call, replacing the await delay by some offer delay.

erikvanoosten commented 1 week ago

Remember https://github.com/zio/zio-kafka/pull/986 ? What changed that we make the queues bounded again?

svroonland commented 1 week ago

Oh wow, I did not remember that indeed. Then we need at least a good theory of what happend in 986 and what would be different this time.

svroonland commented 1 week ago

One thing that is different is that we now a separate commit queue.

erikvanoosten commented 1 week ago

Another thing that is different is that we now have metrics, so you can keep an eye on the actual queue sizes.