pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.11k stars 1.83k forks source link

Casting literal floats to strings rounds the value down. #17773

Closed mmcdermott closed 3 weeks ago

mmcdermott commented 1 month ago

Checks

Reproducible example

>>> pl.select(pl.lit(3.2, dtype=pl.Utf8))
shape: (1, 1)
┌─────────┐
│ literal │
│ ---     │
│ str     │
╞═════════╡
│ 3       │
└─────────┘
>>> pl.select(pl.lit(str(3.2)))
shape: (1, 1)
┌─────────┐
│ literal │
│ ---     │
│ str     │
╞═════════╡
│ 3.2     │
└─────────┘
>>> print(pl.build_info()["version"])
1.1.0
>>> pl.select(pl.lit(3.2).cast(pl.Utf8))
shape: (1, 1)
┌─────────┐
│ literal │
│ ---     │
│ str     │
╞═════════╡
│ 3       │
└─────────┘
>>> pl.select(pl.lit(3.2).cast(pl.Utf8).cast(pl.Float64))
shape: (1, 1)
┌─────────┐
│ literal │
│ ---     │
│ f64     │
╞═════════╡
│ 3.0     │
└─────────┘
>>> pl.select(pl.lit(3.2))
shape: (1, 1)
┌─────────┐
│ literal │
│ ---     │
│ f64     │
╞═════════╡
│ 3.2     │
└─────────┘

Another example, from 1.2.1, which also highlights that the issue is unique to literals.

>>> pl.select(pl.lit(3.2, dtype=pl.Utf8))
shape: (1, 1)
┌─────────┐
│ literal │
│ ---     │
│ str     │
╞═════════╡
│ 3       │
└─────────┘
>>> pl.select((pl.lit(3.2)**2).cast(pl.Utf8))
shape: (1, 1)
┌────────────────────┐
│ literal            │
│ ---                │
│ str                │
╞════════════════════╡
│ 10.240000000000002 │
└────────────────────┘
>>> pl.select(pl.lit(3.2).cast(pl.Utf8))
shape: (1, 1)
┌─────────┐
│ literal │
│ ---     │
│ str     │
╞═════════╡
│ 3       │
└─────────┘
>>> pl.select(((pl.lit(3.2)**2)**0.5).cast(pl.Utf8))
shape: (1, 1)
┌─────────┐
│ literal │
│ ---     │
│ str     │
╞═════════╡
│ 3.2     │
└─────────┘
>>> pl.build_info()["version"]
'1.2.1'

Log output

No response

Issue description

When I make a literal of a floating point value then cast it to a string, the value is rounded down to the nearest integer.

Expected behavior

I expect the string version of the float to retain the decimal points, much like python's str function does.

Installed versions

``` --------Version info--------- Polars: 1.1.0 Index type: UInt32 Platform: Linux-5.15.0-116-generic-x86_64-with-glibc2.35 Python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:23:07) [GCC 12.3.0] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: gevent: great_tables: hvplot: matplotlib: nest_asyncio: numpy: 2.0.0 openpyxl: pandas: pyarrow: 16.1.0 pydantic: pyiceberg: sqlalchemy: torch: xlsx2csv: xlsxwriter: ```
ritchie46 commented 1 month ago

Right, our AnyValue cast doesn't do the same as our cast in our arrow kernels. This should be fixed.

stevenschaerer commented 1 month ago

I'd be happy to take a look.

EricTulowetzke commented 4 weeks ago

@ritchie46 I have fixed this issue where the extraction literal floats is cast to an integer and then to a string. I now have it where it is determine right datatype then extraction float or integer then cast to string.

However, if you want the AnyValue version of cast to work like the arrow version of cast this will take some time to test. Do you want this rewrite cast to be similar or create PR with the simple fix of the above issue?

Question should the Boolean literal be 0 and 1 or true and false?