Open wukan1986 opened 1 year ago
I'm planning to tackle this in the rework of qcut
to bin_quantiles
(for more info see here: https://github.com/pola-rs/polars/issues/10468) by relying on a total ordering of the floats (https://doc.rust-lang.org/std/primitive.f64.html#method.total_cmp). That said, I think it would be wasted work to still fix this in the soon to be deprecated qcut
.
It also panics when the df is empty.
now on all Null columns instead of panic it gives an Error:
df = pl.DataFrame({"test": [None]})
df.with_columns(pl.col("test").qcut(5, labels=["q1", "q2", "q3", "q4", "q5"]))
File "/home/swang/.pyenv/versions/3.11.4/lib/python3.11/site-packages/polars/dataframe/frame.py", line 7872, in with_columns
return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/swang/.pyenv/versions/3.11.4/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 1700, in collect
return wrap_df(ldf.collect())
^^^^^^^^^^^^^
polars.exceptions.ShapeError: Provide nbreaks + 1 labels
Had anyone found a workaround this issue? I'm facing the same thing with a qcut().over() when all values are null:
pl.col(f).qcut(quantiles=3, labels=["0", "2", "4"], allow_duplicates=True).over("date")
After some investigation I found out that the panic happens when you provide labels to the qcut function while all data is null.
There's already a test for full null data, but it doesn't check with labels as input:
# this is the existing test
def test_qcut_full_null() -> None:
s = pl.Series("a", [None, None, None, None])
result = s.qcut([0.25, 0.50])
expected = pl.Series("a", [None, None, None, None], dtype=pl.Categorical)
assert_series_equal(result, expected, categorical_as_str=True)
# the new one - it fails
def test_qcut_full_null_with_labels() -> None:
s = pl.Series("a", [None, None, None, None])
result = s.qcut([0.25, 0.50], labels=["1", "2", "3"])
expected = pl.Series("a", [None, None, None, None], dtype=pl.Categorical)
assert_series_equal(result, expected, categorical_as_str=True)
The test_qcut_full_null_with_labels
fails due to the same error mentioned in this issue:
FAILED tests/unit/operations/test_qcut.py::test_qcut_full_null_with_labels - polars.exceptions.ShapeError: provide len(quantiles) + 1 labels
The spcific line out code that is causing the error is crates/polars-ops/src/series/ops/cut.rs#116
polars_ensure!(l.len() == breaks.len() + 1, ShapeMismatch: "provide len(quantiles) + 1 labels");
I'll try to fix it myself but I'm not really a rust guy.
Checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Issue description
qcut throw PanicException when all None or nan
Expected behavior
keep None or nan, and to_physical is -1
Installed versions