pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.19k stars 1.84k forks source link

Weird/Inconsistent default float / f64 formatting within the same column #12346

Open Julian-J-S opened 10 months ago

Julian-J-S commented 10 months ago

Checks

Reproducible example

pl.DataFrame(
    {
        "ints": [10**i for i in range(4, 10)],
        "floats": [10.0**i for i in range(4, 10)],
    },
)
┌────────────┬──────────┐
│ ints       ┆ floats   │
│ ---        ┆ ---      │
│ i64        ┆ f64      │
╞════════════╪══════════╡
│ 10000      ┆ 10000.0  │
│ 100000     ┆ 100000.0 │ >>>>>>>>>> format 1
│ 1000000    ┆ 1e6      │
│ 10000000   ┆ 1e7      │ >>>>>>>>>> format 2
│ 100000000  ┆ 1e8      │
│ 1000000000 ┆ 1.0000e9 │ >>>>>>>>>> format 3
└────────────┴──────────┘

Log output

No response

Issue description

default float formatting is inconsistent within the same column

Expected behavior

consistent format within a single column

Installed versions

``` 0.19.12 ```
cmdlineluser commented 10 months ago

Yeah, "mixed" is the default mode.

https://pola-rs.github.io/polars/docs/python/dev/reference/api/polars.Config.set_fmt_float.html

with pl.Config(fmt_float="full"):
    pl.DataFrame(
        {
            "ints": [10**i for i in range(4, 10)],
            "floats": [10.0**i for i in range(4, 10)],
        },
    )

# shape: (6, 2)
# ┌────────────┬────────────┐
# │ ints       ┆ floats     │
# │ ---        ┆ ---        │
# │ i64        ┆ f64        │
# ╞════════════╪════════════╡
# │ 10000      ┆ 10000      │
# │ 100000     ┆ 100000     │
# │ 1000000    ┆ 1000000    │
# │ 10000000   ┆ 10000000   │
# │ 100000000  ┆ 100000000  │
# │ 1000000000 ┆ 1000000000 │
# └────────────┴────────────┘

https://github.com/pola-rs/polars/commit/89fadafcd7105268bdd61ba1c21ba8c70ddc7e25 also added https://pola-rs.github.io/polars/docs/python/dev/reference/api/polars.Config.set_float_precision.html

But I'm not sure if that addresses your issue?

Julian-J-S commented 10 months ago

Yeah, "mixed" is the default mode.

Cool, thanks! Did not know about that option.

In general, I think it is nice to adapt formatting to the data especially if values get very high. However, going from 1e8 to 1.0000e9 feels strange.

I will look into that commit to check if anything in the default formatting changed.