pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.28k stars 1.96k forks source link

Change `dt.week()` to be more consistent with `dt.year()` (Gregorian Year) #17229

Open Queilow opened 4 months ago

Queilow commented 4 months ago

Description

Checks

Description

Currently Series.dt.week() returns the ISO 8601 definition of a week. Although this is a very consistent definition of a week, it could potentially lead users to accidentally pair it with dt.year() instead of dt.iso_year() which can cause difficult to catch mistakes. For example:

import polars as pl
from datetime import date

df = pl.DataFrame(
    {'date': [date(2022, 1, 1), date(2022, 12, 31)]}
)

df.with_columns(
    [
        pl.col('date').dt.year().alias('Gregorian year'),
        pl.col('date').dt.iso_year().alias('ISO year'),
        pl.col('date').dt.week().alias('ISO week'),
        pl.col('date').dt.to_string('%W').alias('Proposed Week format')
    ]
)

We can see both dates have the same (Gregorian year, isoweek) tuple despite being 12 months apart. I believe a week method compatible with Gregorian year is practical enough to be an explicit Method

Proposal

Discussion

Pros:

Julian-J-S commented 4 months ago

Yes, I like it and imo good idea to bring with/before 1.0!

Date logic is a very common use case and as you said current design is a bit confusing