utdemir / distributed-dataset

A distributed data processing framework in Haskell.
BSD 3-Clause "New" or "Revised" License
114 stars 5 forks source link

Joins #2

Open utdemir opened 5 years ago

utdemir commented 5 years ago

Implement a join function which joins two datasets based on a key/key function. Different join types (left/right/full outer, cartesian product) should be supported.

axman6 commented 5 years ago

The generalised joins in discrimination might be useful for this - probably not directly, but it should be possible to leverage the implementation there and take advantage of the fast Grouping work.

utdemir commented 5 years ago

@axman6 Thank you for the suggestion! I haven't used discrimination before, but indeed it looks like it might be useful to distributed-dataset on multiple future (joins, shuffles, sorts, ..).

I will look at it thoroughly when I get to implementing those; or you are always welcome to give it a go :)..