pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.36k stars 1.96k forks source link

Adding “Rounding half to even” #17798

Open wany-oh opened 3 months ago

wany-oh commented 3 months ago

Description

In python, round() usually means “Rounding half to even”, but polars Series.round()/Expr.round() do not. It should be adding an argument such as to_even to the method. At the very least, there should be a note in the documentation.

round(0.5)  # -> 0
np.round(0.5)  # -> np.float64(0.0)
pd.Series(0.5).round()[0]  # -> np.float64(0.0)

pl.Series([0.5]).round().item()  # -> 1.0

expected:

pl.Series([0.5]).round(to_even=True).item()  # -> 0.0
eitsupi commented 3 weeks ago

I was surprised that Polars' round is not "bankers rounding". This could be a fatal weakness for precise calculations.

>>> pl.Series([0.5, 1.5, 2.5]).round()
shape: (3,)
Series: '' [f64]
[
        1.0
        2.0
        3.0
]

Note that this is also introduced in Rust as round_ties_even (rust-lang/rust#96710).

johnchristopherjones commented 2 weeks ago

It's also specifically not what numpy does:

In [226]: round(0.3125, 3)
Out[226]: 0.312

In [227]: np.round(0.3125, 3)
Out[227]: np.float64(0.312)

In [228]: pl.Series([0.3125]).round(3)
Out[228]: 

shape: (1,)
Series: '' [f64]
[
        0.313
]

It is often useful to be able to switch strategies when coordinating with a party that uses the "wrong" strategy. But, even IEEE754 recommends rounding ties towards even as the default.

The prevalence of the half-up strategy seems to be mostly due C predating IEEE754 and languages directly importing C's default round behavior. Most languages that are either geared towards numerical analysis or are just plain newer seem to default to half-even.

Quick biased survey:

Half-up

Half-Even