Open daviewales opened 1 month ago
While polars
support the python standard pathlib.Path
source, what actually happens is reading the path string from the input Path
(happening at _prepare_file_arg
in the traceback) and passing it to rust backend.
polars
is yet to support arbitrary python path objects like zipfile.Path
(or, frankly speaking, a path object is not a well-defined concept in python thus polars
cannot properly check whether it is a path object or not), so I think your workaround is indeed the current best solution. In any case, current internal mechanism of polars
requires reading the file content to memory on python side, so there is no performance penalty even if zipfile.Path
support is added.
Thanks. I guess any future improvement would depend on the rust backend gaining support for reading/scanning paths in zip files? And then a special case could be added to the Python library to extract the required information from zipfile.Path
and pass it to the rust backend?
Description
Polars IO methods such as
scan_csv
andread_csv
support regular PythonPath
objects forsource
. However, it would be nice to also be able to read zipfile.Path objects.Sample file: test.zip
I would expect the following to work if this feature is implemented:
Currently I get the following traceback:
I am using the following workaround:
This works, but displays the following warning:
Also, the workaround only works for
read_csv
. It does not work forscan_csv
, which raises the following traceback (probably because read_csv supportsIO
sources, but scan_csv doesn't):