pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.89k stars 1.92k forks source link

Series.set() results in NotImplementedType #15287

Open MarcusJellinghaus opened 7 months ago

MarcusJellinghaus commented 7 months ago

Checks

Reproducible example

import polars as pl
import datetime

import os
os.environ["POLARS_VERBOSE"] = "1"

pl_series_datetime = pl.Series("date", [datetime.date(1751, 1, 1), datetime.date(1754, 1, 1)])
pl_series_datetime_sql_compliant = pl_series_datetime.set(    pl_series_datetime.dt.year() < 1753, pl.Null)
pl2 = pl_series_datetime_sql_compliant.abs()

results in NotImplementedType

Log output

Traceback (most recent call last):
  File "c:\Users\Marcus\Documents\Development\Polars_bug\bug_test.py", line 9, in <module>
    pl2 = pl_series_datetime_sql_compliant.abs()
AttributeError: 'NotImplementedType' object has no attribute 'abs'

Issue description

pl_series_datetime.set( pl_series_datetime.dt.year() < 1753, pl.Null) should return a series

Expected behavior

The result should match

expected_result = pl.Series("date", [None, datetime.date(1754, 1, 1)])
print(expected_result)

shape: (2,)
Series: 'date' [date]
[
        null
        1754-01-01
]

Installed versions

``` --------Version info--------- Polars: 0.20.16 Index type: UInt32 Platform: Windows-10-10.0.19041-SP0 Python: 3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: gevent: hvplot: matplotlib: numpy: openpyxl: pandas: pyarrow: pydantic: pyiceberg: pyxlsb: sqlalchemy: xlsx2csv: xlsxwriter: ```
MarcusJellinghaus commented 7 months ago

Or is it an issue with the code?

cmdlineluser commented 7 months ago

(pl.Null in the code is "wrong" but not the problem.)

It just seems .set() is not yet supported for Date/Datetime Series.

pl_series_datetime.set(pl_series_datetime.dt.year() < 1753, None)
# NotImplemented

If we test Int:

pl_series_datetime.cast(int).set(pl_series_datetime.dt.year() < 1753, None)
# shape: (2,)
# Series: 'date' [i64]
# [
#   null
#   -78892
# ]
MarcusJellinghaus commented 7 months ago

Thank you for the clarification :-)

Would it make sense to support .set() for Date/DateTime Series? As an enhancement idea?

reswqa commented 6 months ago

You can shift the paradigm to when-then-otherwise as set is a little anti-pattern, and will block optimizations like predicate pushdown...

pl_series_datetime = pl.Series("date", [datetime.date(1751, 1, 1), datetime.date(1754, 1, 1)])
pl.select(pl.when(pl_series_datetime.dt.year() < 1753).then(None).otherwise(pl_series_datetime))
shape: (2, 1)
┌────────────┐
│ literal    │
│ ---        │
│ date       │
╞════════════╡
│ null       │
│ 1754-01-01 │
└────────────┘