uint overflow - Githubissues

eromoe commented 1 week ago

Checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

pl.select(pl.lit(48).cast(pl.UInt32)-pl.lit(850).cast(pl.UInt32))

Log output

literal
u32
4294966494

Issue description

I am checking agg result and find this problem,

    df_agg = df.group_by('datetime').agg(
        uc=pl.col('hit_upper').sum(),
        dc=pl.col('hit_lower').sum(),
    ).with_columns(
        ud_ratio=((pl.col('uc')+1)/(pl.col('dc')+1)).log(), # avoid divide zeroe
        ud_diff=(pl.col('uc')-pl.col('dc')),
    ).collect().to_pandas()

Expected behavior

Detect and avoid overflow, automatically convert pl.Uint to pl.Int

Installed versions

``` --------Version info--------- Polars: 1.9.0 Index type: UInt32 Platform: Windows-10-10.0.19041-SP0 Python: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:40:08) [MSC v.1938 64 bit (AMD64)] ----Optional dependencies---- adbc_driver_manager altair cloudpickle 3.0.0 connectorx deltalake fastexcel fsspec 2024.3.1 gevent 24.2.1 great_tables matplotlib 3.8.4 nest_asyncio 1.6.0 numpy 1.24.4 openpyxl 3.1.2 pandas 2.2.2 pyarrow 15.0.2 pydantic 2.6.4 pyiceberg sqlalchemy 2.0.29 torch xlsx2csv xlsxwriter ```

cmdlineluser commented 1 week ago

I think .len() was the last time this topic came up.

https://github.com/pola-rs/polars/issues/17722#issuecomment-2244468461

However, I do think that returning i64 for len is something that we should investigate for 2.0. It is a footgun.

(Although that was just for .len())

orlp commented 1 week ago

Not a bug, something we're thinking about perhaps changing for Polars 2.0, but current semantics for addition/subtraction is that they're wrapping.