Open wence- opened 3 months ago
I am thinking about filing a PR for this. Let me know if you have a timeline/prioritization in mind.
I am thinking about filing a PR for this. Let me know if you have a timeline/prioritization in mind.
Def not P0, I can implement on the interpreter side whenever. It's not blocking any milestones atm
Is your feature request related to a problem? Please describe.
In implementing the cudf-polars interpreter, I would like to use
cudf::merge::merge
to implement the polarsmerge_sorted
functionality. Where polars guarantees that for equal keys in the left and right dataframes,merge_sorted
is stable (that is, equal keys, and hence carrier columns, from the left dataframe appear in the result before those in the right frame), libcudf makes no such guarantee. Note that this is only a binary merge.cudf::merge::merge
is n-ary, rather than binary, and uses a priority queue, based on the length of the input (and intermediate) dataframes that are produced. Hence, we can't guarantee that the result is stable wrt the original input (in the sense described above) without some gymnastics.Describe the solution you'd like
I'd like a stable version of
merge
for the binary case. In this case there's no real advantage to be gained by picking the smaller dataframe as the left one (because there's only one round of merges), and so the implementation is cheap.For stable n-ary merge, I think there's no way around adding an additional disambiguating key column to each table that becomes part of the comparator.
Describe alternatives you've considered
Always adding the disambiguating key column.