pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
26.63k stars 1.63k forks source link

fill_null doesn't support expr #16055

Closed eromoe closed 1 week ago

eromoe commented 1 week ago

Checks

Reproducible example

image

Log output

No response

Issue description

I find fill_null doesn't supoort fill by another col PS: fill_nan works, however cost my 1 hour to dig this problem .

Here in docs https://docs.pola.rs/py-polars/html/reference/dataframe/api/polars.DataFrame.fill_null.html

Expected behavior

Type annotation is Any , I think it should support fill by expr

value: Any | None = None,

Installed versions

``` --------Version info--------- Polars: 0.20.19 Index type: UInt32 Platform: Windows-10-10.0.19041-SP0 Python: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:40:08) [MSC v.1938 64 bit (AMD64)] ----Optional dependencies---- adbc_driver_manager: cloudpickle: 3.0.0 connectorx: deltalake: fastexcel: fsspec: 2024.3.1 gevent: hvplot: matplotlib: 3.8.4 nest_asyncio: 1.6.0 numpy: 1.24.4 openpyxl: 3.1.2 pandas: 1.5.3 pyarrow: 15.0.2 pydantic: 2.6.4 pyiceberg: pyxlsb: sqlalchemy: 2.0.29 xlsx2csv: xlsxwriter: ```
ritchie46 commented 1 week ago

PS: fill_nan works, however cost my 1 hour to dig this problem .

NaN is a floating point value NotANumber which is not the same as null/none.

lyngc commented 1 week ago

You are dividing by 0, resulting in a NaN value. NaN is not a null (missing) value. Being able to differentiate between null values and NaN values is super useful.

cmdlineluser commented 1 week ago

Would it be worthwhile adding a note/warning to the fill_null docs?

Or maybe linking to the user guide?

The assumption that fill_null will work on NaN has popped up a few times.

eromoe commented 1 week ago

@ritchie46 pandas only have method fillna , and other functions like notnull ,isnull also check the value is nan of not .

ritchie46 commented 1 week ago

That's because pandas mixes floating point NaN and missing values.

We don't follow pandas. It is not correct on this matter.

eromoe commented 1 week ago

@ritchie46 I think it better to warn this in docs at least , because many people come from pandas would easily make mistakes .