pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
27.43k stars 1.68k forks source link

Join on multiple keys with Lazy DataFrames panics #17004

Open sdrap opened 1 week ago

sdrap commented 1 week ago

A puzzling panic for multicolumn joins in with lazy dataframes.

Checks

Reproducible example

Joining two dataframes on 2 keys is ok

let df1 = df!("key1" => &["A", "B", "C"],
              "key2" => &["X", "Y", "Z"],
              "val1" => &[1, 2, 3])?;
let df2 = df!("key1" => &["A", "B", "D"],
              "key2" => &["X", "Y", "W"],
              "val2" => &[4, 5, 6])?;

// join on "key1" and "key2"
let joined_df = df1.join(
    &df2,
    &["key1", "key2"],
    &["key1", "key2"],
    JoinArgs::new(JoinType::Inner),
)?;
println!("{:?}", joined_df);

Lazy joining on one key is ok

// join on "key1" and "key2"
let lazy1_joined_df = df1.clone().lazy().join(
    df2.clone().lazy(),
    [col("key1")],
    [col("key1")],
    JoinArgs::new(JoinType::Inner),
).collect()?;
println!("{:?}", lazy1_joined_df);

Lazy joining on two keys panics

// join on "key1" and "key2"
let lazy2_joined_df = df1.clone().lazy().join(
    df2.clone().lazy(),
    [cols(["key1", "key2"])],
    [cols(["key1", "key2"])],
    JoinArgs::new(JoinType::Inner),
).collect()?;
println!("{:?}", lazy2_joined_df);

// Same for this one
let lazy_joined_df = df1
    .clone()
    .lazy()
    .join(
        df2.clone().lazy(),
        [col("key1"), col("key2")],
        [col("key1"), col("key2")],
        JoinArgs::new(JoinType::Inner),
    )
    .collect()?;

Log output

No response

Issue description

Joining two dataframes in non lazy modes on multiple keys is ok while on Lazy it panics (either using cols or a list of col).

Expected behavior

I expect the same output with or without lazy regardless of one or multiple keys.

Installed versions

ritchie46 commented 1 week ago

What kind of panic do you get? Can you show the stack trace?

sdrap commented 1 week ago

Backtrace = 1 I get this (the first counter examples with cols)

thread 'main' panicked at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-plan-0.40.0/src/logical_plan/conversion/expr_to_ir.rs:367:33:
no `columns` expected at this point
stack backtrace:
   0: rust_begin_unwind
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:652:5
   1: core::panicking::panic_fmt
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/panicking.rs:72:14
   2: polars_plan::logical_plan::conversion::expr_to_ir::to_aexpr_impl::{{closure}}
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-plan-0.40.0/src/logical_plan/conversion/expr_to_ir.rs:367:33
   3: stacker::maybe_grow
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:55:9
   4: polars_plan::logical_plan::conversion::expr_to_ir::to_aexpr_impl
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-plan-0.40.0/src/logical_plan/conversion/expr_to_ir.rs:108:1
   5: polars_plan::logical_plan::conversion::expr_to_ir::to_aexpr_impl_materialized_lit
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-plan-0.40.0/src/logical_plan/conversion/expr_to_ir.rs:104:5
   6: polars_plan::logical_plan::conversion::expr_to_ir::to_aexpr
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-plan-0.40.0/src/logical_plan/conversion/expr_to_ir.rs:26:5
   7: polars_plan::dsl::expr::Expr::to_field_amortized
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-plan-0.40.0/src/dsl/expr.rs:313:20
   8: polars_plan::logical_plan::schema::det_join_schema
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-plan-0.40.0/src/logical_plan/schema.rs:317:29
   9: polars_plan::logical_plan::conversion::dsl_to_ir::to_alp_impl::{{closure}}
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-plan-0.40.0/src/logical_plan/conversion/dsl_to_ir.rs:354:17
  10: stacker::maybe_grow
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:55:9
  11: polars_plan::logical_plan::conversion::dsl_to_ir::to_alp_impl
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-plan-0.40.0/src/logical_plan/conversion/dsl_to_ir.rs:59:1
  12: polars_plan::logical_plan::conversion::dsl_to_ir::to_alp
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-plan-0.40.0/src/logical_plan/conversion/dsl_to_ir.rs:53:5
  13: polars_plan::logical_plan::optimizer::optimize
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-plan-0.40.0/src/logical_plan/optimizer/mod.rs:94:22
  14: polars_lazy::frame::LazyFrame::optimize_with_scratch
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-lazy-0.40.0/src/frame/mod.rs:542:22
  15: polars_lazy::frame::LazyFrame::prepare_collect_post_opt
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-lazy-0.40.0/src/frame/mod.rs:595:13
  16: polars_lazy::frame::LazyFrame::_collect_post_opt
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-lazy-0.40.0/src/frame/mod.rs:616:49
  17: polars_lazy::frame::LazyFrame::collect
             at /home/sdrapeau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-lazy-0.40.0/src/frame/mod.rs:646:9
  18: playground::main
             at ./src/main.rs:27:27
  19: core::ops::function::FnOnce::call_once
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.