krasserm / akka-persistence-kafka

A replicated Akka Persistence journal backed by Apache Kafka
Apache License 2.0
201 stars 59 forks source link

Transactional Producers, Akka 2.5.3, Kafka 1.0 #37

Closed giena closed 5 years ago

giena commented 6 years ago

All is in the title

olivierdeckers commented 6 years ago

Hi Giena, thanks for your pull request.

I took a look at the code and have a couple of questions:

  1. Why do you use "read_uncomitted" for getting the highest offset?
  2. As I understand it, there is a single journal actor that processes the write batches sequentially, which is why you are having trouble with throughput when persisting many small batches. You could fix the performance issues by reintroducing the mechanism that delegated to writers that use their own producers. Because a single producer can have at most one transaction at any time, we need multiple producers to increase throughput. Multiple producers (with consistent hashing for delegation) are not a problem for consistency because persistent actors are independent from each other.

Let me know what you think. I have a pull request open that misses some of the things you did (eg. upgrade the akka version), but has the multiple writers feature which results in higher throughput. If you don't have the time to continue on this I can combine both pull requests.

giena commented 6 years ago

Hi Olivier,

Thank you for your analyze.

  1. I was thinking that i could use no transactional producer for the case of 1 event to persist. But i will not do that. So i could use read_comitted. You're right.
  2. I think you're wrong by thinking that writers actors could help. If i remember, for a given persistence id, you always use the same producer (via the modulo operation). See code below in your request: private def writerFor(persistenceId: String): ActorRef = private def writerFor(persistenceId: String): ActorRef = writers(math.abs(persistenceId.hashCode) % config.writeConcurrency) It helps only if you have a lot persistent actors. I don't think it's a good idea to have a lot of topics in kafka, so it does not make sense for me. If you have only one persisted actor in your jvm, you will use always the same transaction/producer. You could have 8 writers, only one will be used. And initially the writers was used because the old producer was synchronous.

Tell me if i make a mistake. Could you please benchmark the 2 solutions to have an idea of the performance loss and how i can execute it here ?

olivierdeckers commented 6 years ago

You are right in that this won't increase throughput in the case of a single persistent entity. I'm not sure if the throughput test in the tck uses a single persistent entity. But in a real world case, there will be a lot more than eight persistent actors, so the throughput will benefit a lot from multiple producers.

If you think about it, the producer in this pull request is still blocking, since commit transaction is blocking. So it makes sense we should keep the multiple writers.

With regard to running many persistent actors on Kafka resulting in a lot of topics being created, we could explore decoupling the number of topics from the number of actors by storing multiple actor journals in a single topic and filtering them when replaying. This is a separate issue however, and not as high of a priority IMO.

giena commented 6 years ago

The throughput on persist (not persistAsync) is tested (optionally but it figures in the TCK) with only one persistent Actor. So all the persist message passing will be serialized into your single actor that manages this persistent actor. And each event will produce a begin/send one message/commit cycle. There's no doubt. But you're right, it makes sense to conserve the writer since commit is blocking.

In my pull request, i do best effort to discard the old API and the zookeeper calls. In yours, I see that you are still using zookeeper to complete the broker set. The Producer API already does that. Why do you keep it? I discard the embedded test broker code since it already exists in the kafka project. To not reinvent the wheel. We cannot keep on mixing old/obsolete code and the new one.

Respecting this last point, we could combine both pull requests.

Another question: Does it make sense to keep 2 producers per writer? One could do the job.

martypitt commented 6 years ago

We're looking to leverage Kafka as an event journal for Akka, and would love to use this - any chance this PR could get merged?

Reeebuuk commented 6 years ago

@krasserm ? :)

boxsterman commented 6 years ago

We are also looking into the possibility of leverage an (existing) Kafka setup for Akka persistence, any news on merging this PR?

giena commented 5 years ago

@krasserm Let's go for a merge or a review?

giena commented 5 years ago

I close this pull request. Krasserm seems to be too busy to take a look. The fork is right here : https://github.com/worldline-messaging/akka-persistence-kafka and it will be improved regularly