pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.36k stars 1.96k forks source link

Support for Column-Specific Float Precision #14766

Open d-reynol opened 8 months ago

d-reynol commented 8 months ago

Description

Proposal: write_csv() & sink_csv() allow you to specify a single precision to apply to all float columns.

Update this behavior to support dictionaries of {'col': precision} to allow for column-specific formatting.

Background & Use Case: I'm currently using polars for ETL in a legacy environment, and the downstream tooling expects columns to have the same precision as the database. NUMERIC(19,6) expects 6 decimal places, NUMERIC(5,3) expects 3 & so on.

This is related to #11929 & #7133 , but a bit of a narrower request.

Julian-J-S commented 8 months ago

I see you point and it makes sense!

Ideally you would use the Decimal type which maps exactly to those database types with a specific precision and scale! Unfortunately this is still experimental and has lots of limitations and I just checked and its not possible to write a Decimal to csv 😆

example for Decimal type

(
    pl.DataFrame(
        {
            "x": ["0.123", "0.234", "0.345"],
            "y": ["0.123", "0.234", "0.345"],
        }
    )
    .with_columns(
        pl.col("x").cast(pl.Decimal(precision=19, scale=6)),
        pl.col("y").cast(pl.Decimal(precision=5, scale=3)),
    )
    # .write_csv("decimal.csv")  # not working
)

# shape: (3, 2)
# ┌───────────────┬──────────────┐
# │ x             ┆ y            │
# │ ---           ┆ ---          │
# │ decimal[19,6] ┆ decimal[5,3] │
# ╞═══════════════╪══════════════╡
# │ 0.123000      ┆ 0.123        │
# │ 0.234000      ┆ 0.234        │
# │ 0.345000      ┆ 0.345        │
# └───────────────┴──────────────┘
d-reynol commented 8 months ago

Yes @JulianCologne , I agree that that would be the ideal solution.

cmdlineluser commented 2 months ago

@Julian-J-S Just stumbled upon this issue - it seems your example now works (and sink_csv also).