Closed LiamConnors closed 1 month ago
Thanks @FBruzzesi! It does sound like it might be what we want. I'll try it out.
Thanks @FBruzzesi! This has the tests passing now. I was wondering about this here: https://github.com/narwhals-dev/narwhals/pull/1041/files#diff-1bb9576cb79de69b2e45bcbc828669f54a6049dbdb76407986b3439ba309369dR481-R487 What situation would cause us to create extra columns in the join? Thanks for your feedback!
Thanks @FBruzzesi! This has the tests passing now.
Thanks! That's awesome 🚀
I was wondering about this here: https://github.com/narwhals-dev/narwhals/pull/1041/files#diff-1bb9576cb79de69b2e45bcbc828669f54a6049dbdb76407986b3439ba309369dR481-R487 What situation would cause us to create extra columns in the join? Thanks for your feedback!
In polars anti join does not add columns in other that match the keys (as it would not know where to add those). However, since the way we replicate that in pandas is via a outer join, columns would be added. To avoid such behavior we are considering solely the column in the keys for other
. Does this help?
Thanks @FBruzzesi! This has the tests passing now.
Thanks! That's awesome 🚀
I was wondering about this here: https://github.com/narwhals-dev/narwhals/pull/1041/files#diff-1bb9576cb79de69b2e45bcbc828669f54a6049dbdb76407986b3439ba309369dR481-R487 What situation would cause us to create extra columns in the join? Thanks for your feedback!
In polars anti join does not add columns in other that match the keys (as it would not know where to add those). However, since the way we replicate that in pandas is via a outer join, columns would be added. To avoid such behavior we are considering solely the column in the keys for
other
. Does this help?
I think so. Does that mean it might technically be possible to rework the cuDF implementation to not use it? Just passing the original other data frame to the cuDF constructor didn't work, so I was curious.
I think so. Does that mean it might technically be possible to rework the cuDF implementation to not use it? Just passing the original other data frame to the cuDF constructor didn't work, so I was curious.
From the cudf docs:
leftsemi : similar to inner join, but only returns columns from the left dataframe and ignores all columns from the right dataframe.
leftanti : returns only rows columns from the left dataframe for non-matched records. This is exact opposite to leftsemi join.
I would deduce that it is taken care of within the method itself, so possibly the step of slicing the columns and dropping duplicates is relevant for non CUDF implementation only
I think so. Does that mean it might technically be possible to rework the cuDF implementation to not use it? Just passing the original other data frame to the cuDF constructor didn't work, so I was curious.
From the cudf docs:
leftsemi : similar to inner join, but only returns columns from the left dataframe and ignores all columns from the right dataframe. leftanti : returns only rows columns from the left dataframe for non-matched records. This is exact opposite to leftsemi join.
I would deduce that it is taken care of within the method itself, so possibly the step of slicing the columns and dropping duplicates is relevant for non CUDF implementation only
Ah ok I see. Thanks for your help. I'll take another look at that. Sorry. That was my misunderstanding @FBruzzesi
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below.
Right now these tests fail with
NotImplementedError: Only indicator=False is currently supported
So the current implementation ofdf.join
how="anti"
won't work.https://docs.rapids.ai/api/cudf/stable/user_guide/api_docs/api/cudf.dataframe.merge/
Seemed like best to mark it as expected to fail for now. @MarcoGorelli what do you think would make sense here. Maybe a note in the docs and a follow up issue to see if we can implement it in another way for cuDF?