phaistos-networks / TANK

A very high performance distributed log service
Apache License 2.0
938 stars 70 forks source link

Merge-join multiple streams #45

Open markpapadakis opened 7 years ago

markpapadakis commented 7 years ago

It should be easy to implement a TankClient utility method that facilitates joining multiple streams based on the message timestamp and a simple merge strategy.
For 1+ streams, we can just consume and buffer from streams and 'pop' the earliest message from all of those buffered message streams, and refill the stream when needed.
This should make it easy to join many different streams as if it was a single stream, all the while retaining time-based ordering (this doesn't guarantee strict ordering but it will almost always be the case anyway). This can work for multiple partitions of the same topic, or multiple partitions across multiple topics.

markpapadakis commented 7 years ago

There is a class like that, in heavy use here, but it hasn't been merged into the client, because we 'll eventually provide a Kafka-streams like abstraction that can also be used for joins.