tomasAlabes / graphql-kafkajs-subscriptions

Apollo graphql subscriptions using kafkajs
MIT License
24 stars 6 forks source link

Lost events as Kafka groupIdPrefix has high likelihood of collisions in large deployments #37

Open simbo1905 opened 4 weeks ago

simbo1905 commented 4 weeks ago

In the following code:

https://github.com/tomasAlabes/graphql-kafkajs-subscriptions/blob/d6642c5ed7ca3a40bfd4eb28facfd04bd9b42be3/src/kafka-pubsub.ts#L69

The probability of collisions in the generation of unique groupIdsuffers from a high likelihood of collisions like The Birthday Paradox problem. When you have 200 subscriptions you get an 87% probability that two graphic subscriptions have the same groupId for their kafka topic subscription. Only one browser will get the kafka event. Kafka will round robin events between the overlapping subscriptions.

The fix would be to add something like a static variable in the class as a counter. Then print that counter into the groupId rather than use a random number which is problematic.

In our case we use kubernetes, we injected into each NodeJS the pod name which is unique, then we implemented a module static count, and add that to the pod name to generator our groupIdPrefix. That way we are sure to get a unique kafka groupId per browser subscription. This is a workaround. If the bug is fixed we can simply use the pod name as a groupIdPrefix.