Closed astrowonk closed 4 months ago
Could you do a bisect to find which commit is involved? I don't have azure access.
@nameexhaustion FYI
Could you do a bisect to find which commit is involved? I don't have azure access.
@nameexhaustion FYI
I'll give it a try. I've never built anything from rust before but I've bisected and started a make build
.
EDITED to say; things are,alas, not going well! I may try again later but I don't have time to troubleshoot the compiling process. If anyone else tries to track this down, please post here!
Can you show the backtrace if you set POLARS_PANIC_ON_ERR=1
and RUST_BACKTRACE=1
?
Ideally on a debug build.
This Azure/Parquet code works fine for me using 1.0.0
and Python 3.12.4
on a: 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:14:38 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6020 arm64
df = pl.scan_parquet(source='az://mycontainer/myfile.parquet', storage_options=my_options)
print(df.collect())
I get this output on a sample parquet fle:
shape: (8, 3)
┌─────┬──────────┬───────┐
│ a ┆ b ┆ d │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 │
╞═════╪══════════╪═══════╡
│ 0 ┆ 0.799578 ┆ 1.0 │
│ 1 ┆ 0.615038 ┆ 2.0 │
│ 2 ┆ 0.476025 ┆ NaN │
│ 3 ┆ 0.403242 ┆ NaN │
│ 4 ┆ 0.208607 ┆ 0.0 │
│ 5 ┆ 0.281009 ┆ -5.0 │
│ 6 ┆ 0.890798 ┆ -42.0 │
│ 7 ┆ 0.38674 ┆ null │
└─────┴──────────┴───────┘
Can you show the backtrace if you set
POLARS_PANIC_ON_ERR=1
andRUST_BACKTRACE=1
?Ideally on a debug build.
Here is the error with those variables set. I have still had no luck compiling polars. This is just with the pip installed version on RHEL8.9, in the Details block below.
run LazyFrame.show_graph() to see" 639 f" the optimized version
{svg.decode()}" 640 ) PanicException: expected at least 1 pathHave you tried removing the space from the path? source='az://car-groupings/appr 2024-06-30.parquet'
Have you tried removing the space from the path?
any container/blob combo with a scan parquet has this error, regardless of the blob name. (and blob names can have spaces…). (And the error doesn't happen in pre 1.0.0 releases)
Have you tried removing the space from the path?
source='az://car-groupings/appr 2024-06-30.parquet'
Wait! I thought I had tested this and got the same error but I was using the wrong storage account.
1.0.0rc2 can handle blob names with spaces, and without spaces. 1.0.0 can handle blob names/az urls without spaces, but not with spaces. I saved two parquet blobs, one with a space in the name, one without.
az://my-blobs/test space.parquet
fails in 1.0.0 (works with 1.0.0rc2)
az://my-blobs/test-space.parquet
works in 1.0.0 (and 1.0.0rc2)
@ritchie46
Checks
Reproducible example
This works and creates a
LazyFrame
and operations work on this lazy frame in 0.20.31. The exact same code fails withComputeError: expected at least 1 path
in 1.0.0.I also tested release candidates for 1.0 worked fine. I tested both rc1 and rc2, the lazy frames are created. Only the 1.0.0 release today has this
ComputeError
.Log output
No response
Issue description
Scan parquet from Azure functionality is broken in 1.0.0.
Expected behavior
The lazy frame should get created.
Installed versions