pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.68k stars 1.99k forks source link

polars.exceptions.ComputeError: unsupported data type when reading CSV: enum when reading CSV #20112

Open gsamatt opened 1 day ago

gsamatt commented 1 day ago

Checks

Reproducible example

import polars as pl
schema=pl.Schema({'A': pl.Int32, 'B': pl.Enum(["1","2","3","4","5"])})
df = pl.DataFrame({'A' : [1, 2, 3, 4, 5], 'B' : ["5", "4", "3", "2", "1"]}, schema=schema)
df.write_csv('example.csv')
pl.read_csv('example.csv', schema=schema)

Log output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lib/python3.10/site-packages/polars/_utils/deprecation.py", line 92, in wrapper
    return function(*args, **kwargs)
  File "lib/python3.10/site-packages/polars/_utils/deprecation.py", line 92, in wrapper
    return function(*args, **kwargs)
  File "lib/python3.10/site-packages/polars/_utils/deprecation.py", line 92, in wrapper
    return function(*args, **kwargs)
  File "lib/python3.10/site-packages/polars/io/csv/functions.py", line 527, in read_csv
    df = _read_csv_impl(
  File "lib/python3.10/site-packages/polars/io/csv/functions.py", line 672, in _read_csv_impl
    pydf = PyDataFrame.read_csv(
polars.exceptions.ComputeError: unsupported data type when reading CSV: enum when reading CSV

Issue description

Can not parse enums from CSV.

Expected behavior

Not crash and parse enums from csv

Installed versions

``` --------Version info--------- Polars: 1.15.0 ```
Zan-L commented 1 day ago

Just adding something related here - the same error happens for Decimal type (if we specify it in the schema).

cmdlineluser commented 1 day ago

Enum support seems to be marked as a TODO in the comments:

https://github.com/pola-rs/polars/blob/e8054870fb44d4ac3ab0794a7b3f411367d0988f/crates/polars-io/src/csv/read/buffer.rs#L555-L558


@Zan-L You can create a new issue if you have a failing example:

pl.read_csv(b"a,b\n1.2,3.4", schema={"a": pl.Decimal(scale=3), "b": pl.String})
# shape: (1, 2)
# ┌──────────────┬─────┐
# │ a            ┆ b   │
# │ ---          ┆ --- │
# │ decimal[*,3] ┆ str │
# ╞══════════════╪═════╡
# │ 1.200        ┆ 3.4 │
# └──────────────┴─────┘
Zan-L commented 19 hours ago

@cmdlineluser Sorry to have misinformed - Decimal can be read correctly now! It was something in my backlog for quite some time, but I couldn't remember to submit a new issue. Never mind!