rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.67k stars 302 forks source link

Integrate `joinable`? #609

Open aeshirey opened 2 years ago

aeshirey commented 2 years ago

I wrote a small crate, joinable to do relational joins between two iterables. Someone suggested this may be a good addition to itertools, so I'd like to ask if this is functionality you'd like me to roll in?

Example usage:

use joinable::Joinable;
let joined = customers
    .iter()
    .inner_join(&orders[..], |cust, ord| cust.id.cmp(&ord.customer_id))
    .map(|(cust, ords)| {
        // Translate from (&Customer, Vec<&Order>)
        (
            &cust.name,
            ords.iter().map(|ord| ord.amount_usd).sum::<f32>(),
        )
    })
    .collect::<Vec<_>>();
scottmcm commented 2 years ago

Hmm, Itertools has similar functionality, but it groups it differently.

For example, it has merge_join_by(...).filter_map(EitherOrBoth::both), which is essentially an inner join, albeit one that cares about cardinality differently from how SQL would do it, in order to avoid cloning, and needs ordered input for efficiency. (But it looks like your inner_join isn't quite what I'd expect from SQL either, since a SQL inner join would give an Iterator<Item = (&Customer, &Order)> -- it giving a Vec means it's also doing a GROUP BY.)

And it has .into_grouping_map().sum() for the "gather everything with a key and sum the values".

Maybe some more examples would help? But the RHS always needing to be a slice doesn't say itertools to me...

aeshirey commented 2 years ago

it giving a Vec means it's also doing a GROUP BY.

Good point. This was desired behavior for my use case but may not be for others. At a minimum, I'll use this as feedback for improving my crate.

But the RHS always needing to be a slice doesn't say itertools to me...

True. My intent was that LHS is an iter and can join to any RHS slice without consuming it because each right record might match multiple left records, and the ordering of RHS isn't necessarily known or required.

I'm happy to provide some more examples if it helps, but if the behavior (currently, grouped records from RHS; using an iter + slice instead of two iters) doesn't fit here, that's fine too.