pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
28.43k stars 1.78k forks source link

Reading csvs through a TextIOWrapper raises OSError: failed to write whole buffer #17428

Open tdsmith opened 1 month ago

tdsmith commented 1 month ago

Checks

Reproducible example

import io
import polars as pl

fixture = """\
first_name,last_name
josé,österlung
additional,content
"""

k = 500  # does not trigger with k=50
long_fixture = fixture * k

# imagine `fixture_bytes` is a large file opened in binary mode:
fixture_bytes = io.BytesIO(long_fixture.encode("latin1"))

# so we need to wrap it in TextIOWrapper to present TextIO to polars:
wrapper = io.TextIOWrapper(fixture_bytes, "latin1", "replace")

pl.read_csv(wrapper)

yields:

OSError: failed to write whole buffer

Log output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tim/projects/polars-textwrapper-repro/.direnv/python-3.12/lib/python3.12/site-packages/polars/_utils/deprecation.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tim/projects/polars-textwrapper-repro/.direnv/python-3.12/lib/python3.12/site-packages/polars/_utils/deprecation.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tim/projects/polars-textwrapper-repro/.direnv/python-3.12/lib/python3.12/site-packages/polars/_utils/deprecation.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tim/projects/polars-textwrapper-repro/.direnv/python-3.12/lib/python3.12/site-packages/polars/io/csv/functions.py", line 418, in read_csv
    df = _read_csv_impl(
         ^^^^^^^^^^^^^^^
  File "/Users/tim/projects/polars-textwrapper-repro/.direnv/python-3.12/lib/python3.12/site-packages/polars/io/csv/functions.py", line 564, in _read_csv_impl
    pydf = PyDataFrame.read_csv(
           ^^^^^^^^^^^^^^^^^^^^^
OSError: failed to write whole buffer

Issue description

Collecting the data into a StringIO before passing it to polars with e.g.

restringified = io.StringIO(wrapper.read())
pl.read_csv(restringified)

works, though is no longer a streaming operation.

Expected behavior

Should not crash.

Installed versions

``` --------Version info--------- Polars: 1.0.0 Index type: UInt32 Platform: macOS-14.5-arm64-arm-64bit Python: 3.12.4 (main, Jul 3 2024, 11:45:52) [Clang 15.0.0 (clang-1500.3.9.4)] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: gevent: great_tables: hvplot: matplotlib: nest_asyncio: numpy: openpyxl: pandas: pyarrow: pydantic: pyiceberg: sqlalchemy: torch: xlsx2csv: xlsxwriter: ```
raayu83 commented 3 weeks ago

I stumbled upon the same problem when trying to implement polars import/export features for pyexasol (Exasol Database client library). According to the docs, TextIOWrapper both IO[str] | IO[bytes] should work with read_csv, so this is probably a bug?