Open pall-j opened 10 months ago
Can you create a minimal working example? For CSV files, it's useful to use StringIO
for this:
from io import StringIO
import polars as pl
csv = StringIO("""
name,age
Frank,30
Michelle,58
Peter,78
""")
df = pl.read_csv(csv)
print(df)
shape: (3, 2)
┌──────────┬─────┐
│ name ┆ age │
│ --- ┆ --- │
│ str ┆ i64 │
╞══════════╪═════╡
│ Frank ┆ 30 │
│ Michelle ┆ 58 │
│ Peter ┆ 78 │
└──────────┴─────┘
Also, looking closely at the CSV text you provided, you have some non-ASCII quotes in there:
"""error”"
This is not properly escaped, you have 3 double quotes (Unicode 34), followed by a double end-quote (which Unicode 8221) and then a single double-quote.
Also, looking closely at the CSV text you provided, you have some non-ASCII quotes in there:
"""error”"
This is not properly escaped, you have 3 double quotes (Unicode 34), followed by a double end-quote (which Unicode 8221) and then a single double-quote.
The first and last double quotes are the ones encapsulating the string, the 2. and 3. double quote are together forming a single escaped double quote. The Unicode 8221 should be kept there as is without need for any escaping (intentional non-ASCII character).
Thus the string should be parsed as "error”
without any issues. This is done correctly as I wrote in case try_parse_dates
is set to False
or schema
is used instead of dtypes
.
Can you create a minimal working example? For CSV files, it's useful to use
StringIO
for this:from io import StringIO import polars as pl csv = StringIO(""" name,age Frank,30 Michelle,58 Peter,78 """) df = pl.read_csv(csv) print(df)
shape: (3, 2) ┌──────────┬─────┐ │ name ┆ age │ │ --- ┆ --- │ │ str ┆ i64 │ ╞══════════╪═════╡ │ Frank ┆ 30 │ │ Michelle ┆ 58 │ │ Peter ┆ 78 │ └──────────┴─────┘
Updated
Checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Python 3.10.10 Polars 0.19.3
Log output
With
RUST_BACKTRACE=full
Issue description
If we set
try_parse_dates
in the provided example toFalse
or if we useschema
instead ofdtypes
, then the data is loaded correctly without any problem. This is the behavior I would expected even in the example case.There seem to be an internal bug in polars as unhandled
PanicException
is being raised.Expected behavior
Installed versions