pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.82k stars 1.92k forks source link

Does Polars still use the external adlfs library for Azure? #15043

Open astrowonk opened 7 months ago

astrowonk commented 7 months ago

Description

The documentation for cloud storage says:

To read from cloud storage, additional dependencies may be needed depending on the use case and cloud storage provider:

And cites adlfs (for azure) as possible dependency to install with pip. But, later it says:

Polars uses the object_store.rs library internally to manage the interface with the cloud storage providers and so no extra dependencies are required in Python to scan a cloud Parquet file.

And I have successfully uninstalled adlfs and polars scan_parquet and read_parquet both still work. Is this line in the documentation outdated?

Link

https://docs.pola.rs/user-guide/io/cloud-storage/

vishnu-ms commented 7 months ago

Is it possible to get more documentation/examples on storage_options with different auth methods (client-secret, managed identity, etc.)? The storage_options that adlfs typically use conform to a different set of options : https://github.com/fsspec/adlfs?tab=readme-ov-file#setting-credentials.

astrowonk commented 7 months ago

Is it possible to get more documentation/examples on storage_options with different auth methods (client-secret, managed identity, etc.)? The storage_options that adlfs typically use conform to a different set of options : https://github.com/fsspec/adlfs?tab=readme-ov-file#setting-credentials.

In the scan_parquet method documentation they link to this azure article on the storage_options keys for Azure. It would be good to bring those links to the cloud storage page.

astrowonk commented 4 months ago

Seems like an easy doc fix if I'm correct… just bumping this after 3 months.