microsoft / Trill

Trill is a single-node query processor for temporal or streaming data.
MIT License
1.24k stars 133 forks source link

Random splitter in a multicore setting #160

Open Ohyoukillkenny opened 2 years ago

Ohyoukillkenny commented 2 years ago

Trill supports using multiple threads to process an input stream by setting Config.StreamScheduler = StreamScheduler.OwnedThreads(n), where n is the number of threads.

Let's consider the number of thread to be 2, and the data item is of shape "(key, val)" (keyset = {1,2,3,...,10}). According to my observation, when I use MapReduce-pattern to compute the sum of data value over key-based partitions, Trill assigns data items with the same key to the same thread with a certain order, e.g., always assigns key = 1, 3, 5, ... to thread 1 and key = 2, 4, ... to thread 2.

Does Trill provide a random splitter that allows us to do things like: key = 1, 2, 5, 7... to thread 1 and key 3, 4,... to thread 2 (i.e., random key-thread correspondence)? Also, can we implement a customized splitter to split the stream?