pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.12k stars 1.83k forks source link

Support `Decimal` in `read_csv` #17349

Closed Julian-J-S closed 1 month ago

Julian-J-S commented 2 months ago

Description

Now that writing Decimal to csv with write_csv is available (yeah! 🥳 ) I would love to see the same for reading 😄

(
    pl.DataFrame({"x": ["0.1", "0.2"]})
    .with_columns(pl.col("x").cast(pl.Decimal(scale=2)))
    .write_csv("decimal.csv")
)

creates

x
"0.10"
"0.20"

but reading then fails

pl.read_csv(source="decimal.csv", schema={"x": pl.Decimal(scale=1)})

# ComputeError: unsupported data type when reading CSV: decimal[*,1] when reading CSV

Ofc I can read as str and parse afterwards but converting while reading would be faster and more efficient 😃

sherlockbeard commented 2 months ago

you can do like pl.read_csv(source="decimal.csv", schema={"x": pl.Float64()})

┌─────┐
│ x   │
│ --- │
│ f64 │
╞═════╡
│ 0.1 │
│ 0.2 │
Julian-J-S commented 2 months ago

you can do like

pl.read_csv(source="decimal.csv", schema={"x": pl.Float64()})


┌─────┐

│ x   │

│ --- │

│ f64 │

╞═════╡

│ 0.1 │

│ 0.2 │

How does that help? 🤔 Float and Decimal are very different types with different use cases. If I read as float I immediately get the imprecision that I want to avoid

sherlockbeard commented 2 months ago

How does that help? 🤔 Float and Decimal are very different types with different use cases. If I read as float I immediately get the imprecision that I want to avoid

Ah Sorry ... Yes it won't help you with precision