Open Julian-J-S opened 3 weeks ago
We have quite some interest in pushing this forward. I have some stuff on my stack still, but I might take a look after that.
Apart from round
are there specific things that need to be addressed surrounding the Decimal type at the moment?
Some aggregations don't seem to be implemented in a groupby context and return null:
Apart from
round
are there specific things that need to be addressed surrounding the Decimal type at the moment?
Just checked my "decimal_problems-Notebook" and almost everything is actually fixed, awesome! 👍 👏
Short overview:
round
not imlementedpl.DataFrame({"x": ["1.23"]}, schema={"x": pl.Decimal}).with_columns(r=pl.col("x").round(decimals=2))
# InvalidOperationError: round can only be used on numeric types
str
to Decimal
pl.DataFrame({"x": ["0.04", "0.05", "0.06"]}).with_columns(
d=pl.col("x").cast(pl.Decimal(scale=1)),
)
┌──────┬──────────────┐
│ x ┆ d │
│ --- ┆ --- │
│ str ┆ decimal[*,1] │
╞══════╪══════════════╡
│ 0.04 ┆ 0.0 │
│ 0.05 ┆ 0.0 │
│ 0.06 ┆ 0.0 │
└──────┴──────────────┘
Other libararies like duckdb round here and not trucate
duckdb.sql("""select x, x::decimal(18,1) as d from df""")
┌─────────┬───────────────┐
│ x │ d │
│ varchar │ decimal(18,1) │
├─────────┼───────────────┤
│ 0.04 │ 0.0 │
│ 0.05 │ 0.1 │
│ 0.06 │ 0.1 │
└─────────┴───────────────┘
Decimal
col by int
should not increase scale(
pl.DataFrame(
{"d": ["0.1"], "i": [2]},
schema_overrides={"d": pl.Decimal},
).with_columns(
mult1=pl.col("d") * 2,
mult2=pl.col("d") * pl.col("i"),
)
)
┌──────────────┬─────┬──────────────┬──────────────┐
│ d ┆ i ┆ mult1 ┆ mult2 │
│ --- ┆ --- ┆ --- ┆ --- │
│ decimal[*,1] ┆ i64 ┆ decimal[*,2] ┆ decimal[*,2] │
╞══════════════╪═════╪══════════════╪══════════════╡
│ 0.1 ┆ 2 ┆ 0.20 ┆ 0.20 │
└──────────────┴─────┴──────────────┴──────────────┘
Expectation: scale
after multiplication is still 1 (only for INT)
str->Decimal
has prec/scale inference but float->Decimal
gets always truncated (38,0)pl.DataFrame(
{
"float": [3.45],
"str": ["3.45"],
},
).with_columns(
pl.all().cast(pl.Decimal).name.suffix("_decimal"),
)
┌───────┬──────┬───────────────┬──────────────┐
│ float ┆ str ┆ float_decimal ┆ str_decimal │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ str ┆ decimal[38,0] ┆ decimal[*,2] │
╞═══════╪══════╪═══════════════╪══════════════╡
│ 3.45 ┆ 3.45 ┆ 3 ┆ 3.45 │
└───────┴──────┴───────────────┴──────────────┘
cum_sum
not being supported has always been an issue: https://github.com/pola-rs/polars/issues/14745
Would echo the core point, this will be a huge unlock in those building financial applications on top of Polars.
Apart from
round
are there specific things that need to be addressed surrounding the Decimal type at the moment?
also added #19756 to the list where Decimal calculation results in null
or incorrect values instead of raising
Description
Problem
Many (financial) applications/transformations require "correctness" that cannot be achieved using float but only decimal. One big missing feature is
round
on decimal (currently "InvalidOperationError: round can only be used on numeric types"). But there are a few other quirks that show that Decimal is not "production ready" yet. (also documented that it is experimental) This is unfortunate because otherwise polars is just soooooo good ❤️ Currently I am either using pyspark or when I am in-process use duckdb (requires duckdb+pyarrow which is quite big) for rounding and going back to polars.Question
Is there any roadmap for the
Decimal
type? I remember long ago that someone said "once 1.0 is there we focus on those things" 😃 Not meaning to put on any pressure but if there is no timeline for Decimal probably many people (including me) might use a different engine like duckdb that does the whole transformation/calculation. Would love to look into implementing this myself but not sure if I find time and dont have much experience with decimal details. 😇some issues:
19756 (
null
or incorrect results)15151 (round support)
19889 (
fill_null
not supported)15154
15153