pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.64k stars 1.99k forks source link

Getting different results when dividing f64::MIN value in column by f64::MIN as literal when have more than 1 value in column #20038

Open rluvaton opened 4 days ago

rluvaton commented 4 days ago

Checks

Reproducible example

#[cfg(test)]
mod tests {
    use polars::{
        df,
        error::PolarsResult,
        prelude::{col, lit, DataFrame, DataType, IntoLazy},
    };

    #[test]
    fn should_work_with_div() -> PolarsResult<()> {
        let min = f64::MIN;
        let expected = min / min;

        let first_item_when_there_is_1_row = get_first_item(df! {
            "a" => [min]
        }?);

        let first_item_when_there_is_2_rows = get_first_item(df! {
            "a" => [min, 0f64]
        }?);

        assert_eq!(first_item_when_there_is_1_row, expected);
        assert_eq!(first_item_when_there_is_2_rows, expected); // <-- this fails

        Ok(())
    }

    fn get_first_item(df: DataFrame) -> f64 {
        let polars_expr = df
            .lazy()
            .select([(col("a") / lit(f64::MIN))]); // <-- this needs to be literal

        let arr = polars_expr.collect().unwrap();

        let res = arr.get_columns()[0].f64().unwrap();

        res.iter().collect::<Vec<_>>()[0].unwrap()
    }
}

Log output

$ POLARS_VERBOSE=1 RUST_BACKTRACE=1 cargo test
   Compiling polars_pg v0.1.0 (/~/rust-pg/polars_pg)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.19s
     Running unittests src/lib.rs (/~/rust-pg/target/debug/deps/polars_pg-d2f10e94186c62a6)

running 1 test
test tests::should_work_with_div ... FAILED

failures:

---- tests::should_work_with_div stdout ----
run ProjectionExec
run ProjectionExec
thread 'tests::should_work_with_div' panicked at polars_pg/src/lib.rs:23:9:
assertion `left == right` failed
  left: 0.9999999999999999
 right: 1.0
stack backtrace:
   0: rust_begin_unwind
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:662:5
   1: core::panicking::panic_fmt
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panicking.rs:74:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panicking.rs:367:5
   4: polars_pg::tests::should_work_with_div
             at ./src/lib.rs:23:9
   5: polars_pg::tests::should_work_with_div::{{closure}}
             at ./src/lib.rs:10:34
   6: core::ops::function::FnOnce::call_once
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ops/function.rs:250:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

failures:
    tests::should_work_with_div

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.05s

error: test failed, to rerun pass `--lib`

Issue description

Getting incorrect results when dividing f64::MIN by itself when dataframe has 2 or more values in column

Related to:

Expected behavior

should get the same correct result no matter the number of values in a column

Installed versions

[package] name = "polars_pg" version = "0.1.0" edition = "2021" [dependencies] polars = { version = "0.44.2",features = ["lazy" ] }
nameexhaustion commented 3 days ago

This is a side effect of a fast path we have for broadcasted float division -

[src/main.rs:2:5] f64::MIN / f64::MIN = 1.0
[src/main.rs:3:5] f64::MIN * (1.0 / f64::MIN) = 0.9999999999999999

cc @orlp

rluvaton commented 3 days ago

@nameexhaustion I fixed it in #20047, let me know if this change is wanted before I add tests