Open gaohao95 opened 1 year ago
Thank you @gaohao95 for suggesting this. We will do some scoping and return to this request.
There are some important limitations to be aware of.
cudf::contains
, which is what builds the hash map internally. Therefore, beyond the previous conceptual limitation, supporting this functionality would also require a deeper refactor to expose the hash table in some way.Thanks @vyasr! Those are good points!
Therefore, the reuse would be limited between two disjoint sets of APIs: semi_join could reuse a map built for anti_join and vice versa, but it could not use a multimap built for inner/left/full joins.
In my use case (broadcast join) this is fine. An object is only needed to probe a single join type.
There is ongoing work to refactor cuco data structures and expand their usage within libcudf. I would not recommend making any changes to the join APIs until that work is further along.
This is not a blocker for us so we can wait.
We also needed this recently for a broadcast join implementation. We would prefer if we cudf::hash_join supports left-semi join.
Is your feature request related to a problem? Please describe.
cudf::hash_join
makes it possible to build the hash table once and probe it multiple times. But it only supports inner join, left join and full join. I wishcudf::hash_join
can support left-semi and left-anti join as well.