jonathansant / Orleans.Streams.Kafka

An implementation of a PersistentStreamProvider for Microsoft Orleans and Kafka using the Confluent API
GNU General Public License v3.0
79 stars 31 forks source link

Hash Collissions will lead to producing on wrong partition? #48

Open dvlprx opened 1 year ago

dvlprx commented 1 year ago

I have a question, regarding the MessageKey, that is generated for producing a message. As you see in line 15 of ProducerExtensions:

Task.Run(() => producer.ProduceAsync(
...
batch.StreamNamespace,                                // this is the target-topic
new Message<byte[], KafkaBatchContainer>
...
    Key = batch.StreamGuid.ToByteArray(),         // this is the target-partition inside the topic
...

Assuming we do have a system that maps the stream-namespace to the aggregate-type and the stream-guid to the actor-grainId, we would end up with 1 partitionId by actor-instance, which is exactly what I need.

Problem: When producing a message, Kafka takes the message-key and runs it through a murmur2 hash-function to create the final partitionId. So we end up with multiple streamGuids being mapped to the same partitionId, which leads to consumers receiving data from 2 actors instead of 1.

I would be interested, if anybody else uses the grainId as streamId and how you surrounded that issue.