Open hughlivingstone opened 4 years ago
@Nevon @ankon Looking at the code I believe I can see what the issue is here. I can propose a fix but before I do I want to confirm what the desired behaviour actually is. When an error is thrown from eachBatch (or from a broker fetch request) is it correct that we need to: 1) wait for any remaining fetch requests to complete (or at least dump any returned batches) 2) wait for any currently executing eachBatch calls to complete 3) abandon processing of any more batches 4) re-throw the error ?
Currently the code doesn't seem to do 1-3 - the barrier just resolves on the first error
help us!!!
Describe the bug We are using
partitionsConsumedConcurrently: 3
so we concurrently consume messages. However we are finding that when we throw an error out of our eachBatch/eachMessage handler it causes all partitions to start processing the messages again even though the partitions that did not fail are still being processed.It leads to us processing the same message multiple times in parallel. The log output of a test we ran illustrates it below. We are also seeing this same error in our production environment.
I have attached a file below that we used to produce a similar output to above. kafka.concurrency.test.ts.zip
It is typescript and you will need to be running a kafka container and configure the host/port appropriately
I have also attached a log with debug enabled. test-output.log
To Reproduce
Expected behavior The message processing on the partition where the error was throw should retry, the other 2 partitions should not start being processed again
Observed behavior You will see the consumer start processing all 3 messages from the start immediately after the error is thrown. This happens even though the 2 that did not fail are still being processed.
Environment:
Additional context Add any other context about the problem here.