pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.33k stars 1.86k forks source link

write_csv ignores formatting when writing to io.StringIO() #18825

Open bskubi opened 1 day ago

bskubi commented 1 day ago

Checks

Reproducible example

from polars import DataFrame
import io

df = DataFrame({"c1":[1, 2], "c2":[3, 4]})
buffer = io.StringIO()
df.write_csv(buffer, separator="\t", include_header=False)
print(buffer.getvalue())

Expected output:

1      3
2      4

Wrong output:

c1,c2
1,3
2,4

Log output

No response

Issue description

I'm trying to obtain a Python string containing the output that would be written by the write_csv function.

Expected behavior

The code sample above should print a string that is tab-delimited and has no header, per the options I specify with write_csv.

Installed versions

``` --------Version info--------- Polars: 1.7.1 Index type: UInt32 Platform: Linux-6.8.0-40-generic-x86_64-with-glibc2.35 Python: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0] ----Optional dependencies---- adbc_driver_manager altair cloudpickle 3.0.0 connectorx deltalake fastexcel fsspec 2024.6.1 gevent great_tables matplotlib 3.9.1 nest_asyncio 1.6.0 numpy 1.26.4 openpyxl pandas 2.2.2 pyarrow 16.1.0 pydantic 2.8.2 pyiceberg sqlalchemy 2.0.34 torch xlsx2csv xlsxwriter ```
cmdlineluser commented 21 hours ago

Can reproduce.

This just seems to be an issue in the Python logic.

.write_csv(buf) on line 2862 doesn't take user params into consideration.

https://github.com/pola-rs/polars/blob/938494534f66914afe2815f1d718e0024bdf224b/py-polars/polars/dataframe/frame.py#L2860-L2871

mcrumiller commented 9 hours ago

Maybe we should rename the parameter to source_schema? This issue seems to pop up quite often.