Open ritchie46 opened 3 weeks ago
Hoo boy, this one is going to break some code.
There should be a preserve_order
attribute added, defaulting to None
, which can be set to "left"
or "right"
.
@s-banach Without breaking this promise the streaming join will be slow by default, because you can't do a partitioned join if you must preserve order. Or at least, it would require a slow re-combining and re-sorting step afterwards.
And if order is preserved we can't switch which side of the join is a build and probe side either, in streaming. That's something we'd like to be able to do in the future as you'd much rather have a small build side.
I think we can already add this preserve_order
parameter and implement it before 2.0 hits.
Yes, maintain_order
it's called then. We already use that.
Description
This shouldn't have been guaranteed, but left as an implementation detail.
https://github.com/pola-rs/polars/blob/c3c38a9ddc13d7b0b0d1c413f5183c1ee8b06709/py-polars/polars/lazyframe/frame.py#L4443
Link
No response