pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.26k stars 1.95k forks source link

Allow `os.PathLike` objects in `read_*` and `scan_*` functions #17828

Open edavisau opened 3 months ago

edavisau commented 3 months ago

Description

This basically already works: because current normalize_filepath implementation already works with os.PathLike (just needs updated type hints).

The only place where change is needed is here

if isinstance(source, (str, Path)):   #   ->   (str, os.PathLike) instead
    source = normalize_filepath(source, check_not_directory=False)
else:
    source = [
        normalize_filepath(source, check_not_directory=False) for source in source
    ]

The rest of the change would be type hints

Example

import tempfile
import polars as pl

tmp_file = tempfile.NamedTemporaryFile()
pl.DataFrame(dict(a=[1,2,3])).write_parquet(tmp_file.name)

class Foo:
    def __fspath__(self):
        return tmp_file.name

isinstance(Foo(), os.PathLike)
# True

pl.read_parquet(Foo())
# TypeError: 'Foo' object is not iterable

pl.read_parquet([Foo()])
# shape: (3, 1)
# ┌─────┐
# │ a   │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# │ 2   │
# │ 3   │
# └─────┘
stinodego commented 3 months ago

We should be able to support os.PathLike inputs.

A PR is welcome if it includes extensive testing for various pathlike inputs other than pathlib.Path.