pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.1k stars 1.94k forks source link

Scan from HTTP path errors with `empty host` #17592

Closed douglas-raillard-arm closed 3 months ago

douglas-raillard-arm commented 3 months ago

Checks

Reproducible example

import polars as pl
df = pl.scan_parquet('http://username:password@a.url.com/a/file.parquet')
df.collect()

Log output

polars.exceptions.ComputeError: empty host

Issue description

When the URL passed to scan_parquet() includes basic auth (username and password), polars just drops 99% of the url, as can be seen from df.serialize(format='json'):

'{"Scan":{"paths":["https:///"],"file_info":null,"hive_parts":null,"predicate":null,"file_options":{"n_rows":null,"with_columns":null,"cache":true,"row_index":null,"rechunk":false,"file_counter":0,"hive_options":{"enabled":true,"hive_start_idx":0,"schema":null,"try_parse_dates":true}},"scan_type":{"Parquet":{"options":{"parallel":"Auto","low_memory":false,"use_statistics":true},"cloud_options":null}}}}'

The issue may be unrelated to basic auth itself and just a general problem with HTTP support in polars.

Expected behavior

The URL should be preserved.

Installed versions

``` --------Version info--------- Polars: 1.1.0 Index type: UInt32 Platform: Linux-5.15.0-105-generic-x86_64-with-glibc2.31 Python: 3.12.4 (main, Jun 8 2024, 18:29:57) [GCC 9.4.0] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: 2024.6.1 gevent: great_tables: hvplot: matplotlib: nest_asyncio: 1.6.0 numpy: 2.0.0 openpyxl: pandas: 2.2.2 pyarrow: 16.1.0 pydantic: 2.8.2 pyiceberg: sqlalchemy: 2.0.31 torch: xlsx2csv: xlsxwriter: ```
douglas-raillard-arm commented 3 months ago

It was working in version 0.20.31 and the issue appeared in 1.0.0

ritchie46 commented 3 months ago

@nameexhaustion can you take a look?

nameexhaustion commented 3 months ago

I think this is fixed on main, by https://github.com/pola-rs/polars/pull/17571

douglas-raillard-arm commented 3 months ago

I just compiled the main branch and it's indeed fixed, thanks you :)