haskell-works / hw-kafka-client

Kafka client for Haskell, including auto-rebalancing consumers
MIT License
140 stars 50 forks source link

pollMessageBatch not timing out #196

Closed JeanHuguesdeRaigniac closed 1 year ago

JeanHuguesdeRaigniac commented 1 year ago

Doc says about pollMessageBatch: "Polls up to BatchSize messages. Unlike pollMessage this function does not return usual "timeout" errors. An empty batch is returned when there are no messages available."

How is to be understood this Timeout parameter? It seems useless. In our app, this function does not return before having gathered BatchSize records:

msgs <- pollMessageBatch consumer (Timeout 500) (BatchSize totalNumberOfRecords)

This is not the desired behavior since we don't want messages to wait in topics.

We rather expect it to try to get up to BatchSize records before Timeout, i.e. return what it got so far if Timeout occurred before BatchSize was reached.

Is it a bug or is there another way to do it? Like:

msgs <- replicateM totalNumberOfRecords $ pollMessage consumer (Timeout 500)  
AlexeyRaga commented 1 year ago

Hi @JeanHuguesdeRaigniac

How is to be understood this Timeout parameter? It seems useless.

The timeout parameter, like in other similar functions, specifies the amount of time to wait when there are no messages to consume. It is there to basically simplify polling and to avoid tight loops.

In our app, this function does not return before having gathered BatchSize records This is not the desired behavior since we don't want messages to wait in topics.

Neither is an intended one. Your guess is correct and it should return when timeout occurred even if there are not enough messages received.

In fact, we even have tests that ensure this behaviour, and it passes:

image

I am not sure if there is a bug and why it would work in the way you describe... One thing that we don't check in tests is what happens if you try to consume the topic that doesn't yet exist. Can it be your case? I need to check what would happen in this situation, this is an edge case that I don't remember thinking about :)

Is it a bug or is there another way to do it? Like replicateM

You can go with just pollMessage (same logic about timeout applies here), and I don't think that there is any difference in terms of performance, since librdkafka internally would still fetch batches from Kafka into its internal queues. The only difference here is operational: if you really want to process multiple messages at once, then pollMessageBatch is more convenient. But if you logically process one message by one, then pollMessage is just fine.

JeanHuguesdeRaigniac commented 1 year ago

Thank you @AlexeyRaga for this comprehensive reply.

It could not be because of a missing topic, we use it daily.

I delayed this comment because I wanted to find an occasion to dig deeper this issue but I can't: there is not enough data to look for causes. It crossed my mind maybe the last consumed message offset was not sent back to update the lag counter but I can't check it. I also have doubts about the bug analysis brought to my attention because it does not match the way we use this topic.

Anyway, no need to clutter your issues with it anymore.