`Expr.replace` silently fails when `return_dtype=pl.Decimal(scale=BIG)`

pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

Other

30.21k stars 1.95k forks source link

from decimal import Decimal as Dec import polars as pl def test_replace_return_dec(): col = pl.Series(name="my_data", values=["a", "b", "c"]) mapping = { "a": Dec("4.0"), "b": Dec("5.0"), "c": Dec("6.0"), } col1 = col.replace(mapping, return_dtype=pl.Decimal(scale=37)) col2 = col.replace(mapping, return_dtype=pl.Decimal(scale=38)) assert tuple(col1) == (Dec("4.0"), Dec("5.0"), Dec("6.0")) # ✅ OK # for some reason, the following assertion fails because values have become null # no error has been raised, it happened silently! assert tuple(col2) == (Dec("4.0"), Dec("5.0"), Dec("6.0")) # ❌ KO with 1.0.0-beta.1 # next line explicitly raises a InvalidOperationError since "conversion failed" _ = pl.Series(col1, dtype=pl.Decimal(scale=38))

The reason is not obvious, but I think I have an "almost why" answer, and it involves the limits of polars Decimal type.

the scale parameter of a pl.Decimal is the number of digits to the right of the decimal.
the precision parameter is the total number of digits.

The precision parameter must be at least one more than the scale parameter, because any value less than 1 must be represented by 0.xxxx and the 0 takes up a digit.

The maximum polars decimal value is 10^38-1, which you can see for yourself:

import polars as pl
from decimal import Decimal

pl.Series([Decimal(10**38-1)])  # ok
pl.Series([Decimal(10**38)])    # error
# RuntimeError: BindingsError: "Decimal is too large to fit in Decimal128"

So when you specify scale=37, that means that precision must be 38 at minimum. which means 38 digits. Now, we can cast a value of [1] to a scale of 37:

pl.Series([1]).cast(pl.Decimal(scale=37))
# shape: (1,)
# Series: '' [decimal[38,37]]
# [
#         1
# ]

But a scale of 38 fails:

pl.Series([1]).cast(pl.Decimal(scale=38)) 
# polars.exceptions.ComputeError: conversion from `i64` to `decimal[38,38]` failed in column '' for 1 out of 1 values: [1]

And likewise a value of 2 fails at precision 37. So going further:

1 fails at scale 38, succeeds at 37
10 fails at scale 37, succeeds at 36
100 fails at scale 36, succeeds at 35 ...etc.

So we are most likely hitting the upper limit when we shouldn't, since we can indeed specify these with precision instead of scale and it works:

pl.Series([1], dtype=pl.Decimal).cast(pl.Decimal(precision=100000, scale=37))
# shape: (1,)
# Series: '' [decimal[100000,37]]
# [
#         1
# ]

So tl;dr the issue involves checks on precision/scale bounds I believe, and it sometimes yells and sometimes doesn't.

pola-rs / polars

`Expr.replace` silently fails when `return_dtype=pl.Decimal(scale=BIG)` #15037

Checks

Reproducible example

Issue description

Expected behavior

Installed versions