pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.23k stars 1.84k forks source link

polars.read_csv_batched encoding Parameters not worked #18453

Closed Jang-Ji-Yeon closed 1 week ago

Jang-Ji-Yeon commented 2 weeks ago

Checks

Reproducible example

pl.read_csv_batched(csv_path, separator=',', try_parse_dates=False, batch_size=chunk_size,   encoding='euc-kr',infer_schema_length=chunk_size, quote_char='"').next_batches(chunk_size)

Log output

No response

Issue description

The parameter does not work if the encoding value of the polar.read_csv_batched function is not utf8, utf8-lossy

Expected behavior

I need a way to do euc-kr encoding

Installed versions

--------Version info---------
Polars:              1.6.0
Index type:          UInt32
Platform:            Windows-10-10.0.22631-SP0
Python:              3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               22.10.2
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         <not installed>
numpy                1.24.4
openpyxl             3.1.2
pandas               1.5.3
pyarrow              6.0.1
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           1.4.51
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
hxzhao527 commented 2 weeks ago

the encoding parameter doesn't work as the document described https://github.com/pola-rs/polars/blob/9d5d7d0da9104dd41801b547b917e39c8f7f93aa/py-polars/polars/io/csv/functions.py#L786-L790

It can only be 'utf8' or 'utf8-lossy' https://github.com/pola-rs/polars/blob/9d5d7d0da9104dd41801b547b917e39c8f7f93aa/py-polars/polars/io/csv/functions.py#L968 https://github.com/pola-rs/polars/blob/9d5d7d0da9104dd41801b547b917e39c8f7f93aa/crates/polars-io/src/csv/read/options.rs#L325-L331

ritchie46 commented 1 week ago

This is expected and documented. Polars' native reader only support utf8.