payara / ecosystem-support

Placeholder repository to handle community requests for the Payara Platform ecosystem tools
3 stars 2 forks source link

Exactly-once semantics with Kafka Connector #19

Closed jfbenckhuijsen closed 1 year ago

jfbenckhuijsen commented 4 years ago

Hi,

I was looking at the Kafka JCA adapter to start using it in our backend system we're currently developing. One of the issues I ran into is a lack of support for exactly-once semantics in processing the Kafka messages. The current connector only supports at-least and at-most once delivery of messages.

Of course, Kafka doesn't support exactly-once by itself, however you can do this yourself, by providing the necessary extension points. The current structure of the JCA adapter however doesn't allow access to these extension points. I started working on a PR to expose these extension points through the JCA adapter, but then ran into the issue the current implementation of the JCA adapter has some limitations with regards to how threads are managed that this will never work. Basically the current approach cannot guarantee ordering of message delivery, which is really a nice feature of Kafka we need.

I could extend the PR with a rewrite also of the threading handling to alleviate this, however I first want to verify with you guys if such a PR would be accepted as this would basically be quite an incompatible change.

(Btw. I'm working on this from the company I work for, meaning I still have to get the OK to publish such a PR in the first place, so no guarantees there. For obvious reasons I cannot disclose the company atm).

smillidge commented 4 years ago

In order semantics implies a single MDB processing messages on a single thread. This is possible by limiting the number of MDB instances to 1 in your application server of choice. Note that commitEachPoll would need to be set to true to get close to Exactly Once. With this combination a number of Kafka messages will be retrieved on a poll and each message will then be processed by the single MDB in turn. Note however Kafka as you say does not really support exactly once, in case of failure during the processing the messages in the batch will be replayed as the commit will not reach Kafka until all records received have been processed. I suppose we could add a setting where the commit is sent to Kafka before the processing completes in which case there could be message loss.

smillidge commented 4 years ago

The latest version of the adapter also has useSynchMode which will force all polling of Kafka to be performed in a single thread and the same thread is used to send the messages to the MDB.

bbiallowons commented 3 years ago

Is it possible to implement exactly-once with the transaction API in Kafka (https://www.confluent.de/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/, https://www.confluent.io/blog/transactions-apache-kafka/)?

fturizo commented 1 year ago

We're sorry but this issue fell through the cracks in the past years, please let us know in a more complete report if you still require this enhancement by raising a new issue.