pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.63k stars 1.99k forks source link

`df.write_excel` does not work with file objects #18849

Open littleblubber opened 2 months ago

littleblubber commented 2 months ago

Checks

Reproducible example

import polars as pl

df = pl.DataFrame({"a": [1, 2, 3, 4, 5]})

with open("dataframe.xlsx", "wb") as f:
    df.write_excel(f)

Log output

Traceback (most recent call last):
  File "<input>", line 6, in <module>
  File "~/.../polars/dataframe/frame.py", line 3363, in write_excel
    wb, ws, can_close = _xl_setup_workbook(workbook, worksheet)
  File "~/.../polars/io/spreadsheet/_write_utils.py", line 595, in _xl_setup_workbook
    file = Path("dataframe.xlsx" if workbook is None else workbook)
  File "/usr/lib/python3.9/pathlib.py", line 1072, in __new__
    self = cls._from_parts(args, init=False)
  File "/usr/lib/python3.9/pathlib.py", line 697, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/usr/lib/python3.9/pathlib.py", line 681, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not BufferedWriter

Issue description

_xl_setup_workbook incorrectly defaults to treating file objects like io.BufferedWriter, fsspec.implementations.local.LocalFileOpener, or s3fs.S3File as paths.

This is because the isinstance(workbook, BytesIO) condition condition is False for these objects.

By contrast, write_csv and write_json methods work fine with file-like objects such as the ones mentioned above.

Expected behavior

Similar to write_csv or write_json, the expected behaviour in _xl_setup_workbook is to explicitly check that the workbook is a string/path-like object, otherwise defaulting to treating the workbook as file-like.

I think this could be accomplished by modifying the existing if-else statement along the following lines:

if isinstance(workbook, (str, os.PathLike)):
    file = Path("dataframe.xlsx" if workbook is None else workbook)
    # ...
else:
    wb, ws, can_close = Workbook(workbook, workbook_options), None, True

Installed versions

``` pl.show_versions() --------Version info--------- Polars: 1.7.1 Index type: UInt32 Platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.31 Python: 3.9.5 (default, Nov 23 2021, 15:27:38) [GCC 9.3.0] ----Optional dependencies---- adbc_driver_manager altair cloudpickle connectorx deltalake fastexcel 0.11.5 fsspec 2024.3.1 gevent great_tables matplotlib 3.8.0 nest_asyncio numpy 1.26.2 openpyxl 3.1.5 pandas 2.1.1 pyarrow 16.1.0 pydantic 2.5.3 pyiceberg sqlalchemy 1.4.49 torch xlsx2csv 0.8.3 xlsxwriter 3.2.0 ```
v-wei40680 commented 2 months ago

在 Polars 中,你无需使用 with open() 来打开文件进行写入操作。Polars 提供了直接写入文件的方法,比如 df.write_excel()(或其他格式),该方法会直接处理文件路径。

正确的用法是:

df.write_excel("dataframe.xlsx")

这样,Polars 会自动处理文件的打开和关闭,而不需要手动通过 with open() 进行处理。因此,使用 with open() 是不必要的。

In Polars, you don't need to use with open() to open a file for a write operation.Polars provides methods to write directly to a file, such as df.write_excel() (or other formats), which will deal with the file path directly.

The correct usage is:

``python df.write_excel(“dataframe.xlsx”)



This way, Polars handles the opening and closing of the file automatically, rather than having to do it manually with `with open()`. Therefore, using `with open()` is not necessary.

Translated with DeepL.com (free version)
mcrumiller commented 2 months ago

@v-wei40680 sorry, can you post in English if possible? Here is a translation of your post (using Google Translate):

In Polars, you don't need to use with open() to open files for writing. Polars provides methods for writing files directly, such as df.write_excel() (or other formats), which directly handle file paths.

The correct usage is:

df.write_excel("dataframe.xlsx")
littleblubber commented 2 months ago

Thanks both; that's correct, however, that solution does not work if I'm using fsspec/s3fs file system classes to open the file and write.

As mentioned above, this means that df.write_excel behaviour is not compatible with other write methods which support working with file system classes.

not-so-rabh commented 2 months ago

Hello new here. Can i take this up?