spring-projects / spring-kafka

Provides Familiar Spring Abstractions for Apache Kafka
https://projects.spring.io/spring-kafka
Apache License 2.0
2.19k stars 1.56k forks source link

Potential Bug: Missing Failed Records During Async Operation #3638

Closed chickenchickenlove closed 21 hours ago

chickenchickenlove commented 2 days ago

In what version(s) of Spring for Apache Kafka are you seeing this issue?

3.3-SNAPSHOT

Describe the bug

From this issue, spring-kafka supports async retry with retry topic. However, IMHO, spring-kafka has a potential bug described below. https://github.com/spring-projects/spring-kafka/blob/70a11a698284ce68293eaaf735fb9756713a3bec/spring-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java#L1467-L1469

We can imagine this scenario. (Thread A is thread in executor for Mono or CompletableFuture)

  1. Main Thread : copy records from failedRecords. In this time, failedRecords.size() is 100. so, Main Thread has 100 failed records to retry.
  2. Thread A : Oops! I encounter an exception during operation. Add this record to failedRecords. then, failedRecords.size() is 101.
  3. Main Thread : clear failedRecords by executing failedRecords.clear().

In this scenario, Main thread has 100 failed records to retry. But, Main Thread removed 101 failed records. Therefore, 1 failed record will be missed.

To Reproduce

Expected behavior

The KafkaMessageListenerContainer should not miss any failedRecords during handleAsyncFailure.

Sample