Open Hoeze opened 9 months ago
Merging sorted dataframes on a struct-type key is not implemented yet:
test_df = pl.DataFrame( { "idx_1": [1, 2, 3, 1, 2, 3], "idx_2": [4, 4, 5, 5, 6, 6], "value": [1, 2, 3, 4, 5, 6], } ) test_df = test_df.with_columns(key=pl.struct('idx_1','idx_2')).sort('key') test_df.merge_sorted(test_df, key="key")
thread '<unnamed>' panicked at crates/polars-ops/src/frame/join/merge_sorted.rs:144:13: not implemented note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace --------------------------------------------------------------------------- PanicException Traceback (most recent call last) Cell In[5], line 10 1 test_df = pl.DataFrame( 2 { 3 "idx_1": [1, 2, 3, 1, 2, 3], (...) 6 } 7 ) 8 test_df = test_df.with_columns(key=pl.struct('idx_1','idx_2')).sort('key') ---> 10 test_df.merge_sorted(test_df, key="key") File /opt/anaconda/lib/python3.10/site-packages/polars/dataframe/frame.py:10221, in DataFrame.merge_sorted(self, other, key) 10157 def merge_sorted(self, other: DataFrame, key: str) -> DataFrame: 10158 """ 10159 Take two sorted DataFrames and merge them by the sorted key. 10160 (...) 10219 └────────┴─────┘ 10220 """ > 10221 return self.lazy().merge_sorted(other.lazy(), key).collect(_eager=True) File /opt/anaconda/lib/python3.10/site-packages/polars/lazyframe/frame.py:1706, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, no_optimization, streaming, _eager) 1693 comm_subplan_elim = False 1695 ldf = self._ldf.optimization_toggle( 1696 type_coercion, 1697 predicate_pushdown, (...) 1704 _eager, 1705 ) -> 1706 return wrap_df(ldf.collect()) PanicException: not implemented
See also https://github.com/pola-rs/polars/issues/10935#issuecomment-1879718671
I tried this with latest polars v0.22.2:
This functionality would be very helpful, also because merge_sorted only supports a single key column, and structs are a natural way to sort by multiple indices at once.
merge_sorted
key
This is still an issue with v1.0
Description
Merging sorted dataframes on a struct-type key is not implemented yet:
See also https://github.com/pola-rs/polars/issues/10935#issuecomment-1879718671
I tried this with latest polars v0.22.2: