pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.1k stars 1.94k forks source link

Cannot scan cloud files containing spaces in path name #17335

Closed astrowonk closed 4 months ago

astrowonk commented 4 months ago

Checks

Reproducible example

pl.scan_parquet(source='az://mycontainer/myfile.parquet',
                storage_options=my_options)

This works and creates a LazyFrame and operations work on this lazy frame in 0.20.31. The exact same code fails with ComputeError: expected at least 1 path in 1.0.0.

I also tested release candidates for 1.0 worked fine. I tested both rc1 and rc2, the lazy frames are created. Only the 1.0.0 release today has this ComputeError.

Log output

No response

Issue description

Scan parquet from Azure functionality is broken in 1.0.0.

Expected behavior

The lazy frame should get created.

Installed versions

``` --------Version info--------- Polars: 1.0.0 Index type: UInt32 Platform: Linux-4.18.0-513.24.1.el8_9.x86_64-x86_64-with-glibc2.28 Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] ----Optional dependencies---- adbc_driver_manager: cloudpickle: 3.0.0 connectorx: deltalake: fastexcel: fsspec: 2024.3.1 gevent: great_tables: hvplot: matplotlib: 3.8.4 nest_asyncio: 1.6.0 numpy: 1.24.4 openpyxl: 3.1.2 pandas: 2.1.4 pyarrow: 14.0.2 pydantic: pyiceberg: sqlalchemy: 1.4.52 torch: xlsx2csv: xlsxwriter: 3.1.1 ```
ritchie46 commented 4 months ago

Could you do a bisect to find which commit is involved? I don't have azure access.

@nameexhaustion FYI

astrowonk commented 4 months ago

Could you do a bisect to find which commit is involved? I don't have azure access.

@nameexhaustion FYI

I'll give it a try. I've never built anything from rust before but I've bisected and started a make build.

EDITED to say; things are,alas, not going well! I may try again later but I don't have time to troubleshoot the compiling process. If anyone else tries to track this down, please post here!

ritchie46 commented 4 months ago

Can you show the backtrace if you set POLARS_PANIC_ON_ERR=1 and RUST_BACKTRACE=1?

Ideally on a debug build.

Bidek56 commented 4 months ago

This Azure/Parquet code works fine for me using 1.0.0 and Python 3.12.4 on a: 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:14:38 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6020 arm64

df = pl.scan_parquet(source='az://mycontainer/myfile.parquet', storage_options=my_options)
print(df.collect())

I get this output on a sample parquet fle:

shape: (8, 3)
┌─────┬──────────┬───────┐
│ a   ┆ b        ┆ d     │
│ --- ┆ ---      ┆ ---   │
│ i64 ┆ f64      ┆ f64   │
╞═════╪══════════╪═══════╡
│ 0   ┆ 0.799578 ┆ 1.0   │
│ 1   ┆ 0.615038 ┆ 2.0   │
│ 2   ┆ 0.476025 ┆ NaN   │
│ 3   ┆ 0.403242 ┆ NaN   │
│ 4   ┆ 0.208607 ┆ 0.0   │
│ 5   ┆ 0.281009 ┆ -5.0  │
│ 6   ┆ 0.890798 ┆ -42.0 │
│ 7   ┆ 0.38674  ┆ null  │
└─────┴──────────┴───────┘
astrowonk commented 4 months ago

Can you show the backtrace if you set POLARS_PANIC_ON_ERR=1 and RUST_BACKTRACE=1?

Ideally on a debug build.

Here is the error with those variables set. I have still had no luck compiling polars. This is just with the pip installed version on RHEL8.9, in the Details block below.

Details ``` thread '' panicked at /home/runner/work/polars/polars/crates/polars-error/src/lib.rs:23:13: expected at least 1 path stack backtrace: 0: rust_begin_unwind 1: core::panicking::panic_fmt 2: >::from::panic_cold_display 3: >::from 4: polars_plan::plans::conversion::scans::parquet_file_info 5: polars_plan::plans::conversion::dsl_to_ir::to_alp_impl::{{closure}} 6: polars_plan::plans::conversion::dsl_to_ir::to_alp_impl 7: polars_plan::plans::conversion::dsl_to_ir::to_alp 8: polars_lazy::frame::LazyFrame::to_alp 9: polars::lazyframe::PyLazyFrame::__pymethod_to_dot__ 10: pyo3::impl_::trampoline::trampoline 11: polars::lazyframe::_::__INVENTORY::trampoline 12: cfunction_call at /usr/local/src/conda/python-3.10.13/Objects/methodobject.c:543:19 13: _PyObject_MakeTpCall at /usr/local/src/conda/python-3.10.13/Objects/call.c:215:18 14: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:112:16 15: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:99:1 16: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 17: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 18: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4231:19 19: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 20: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 21: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 22: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 23: method_vectorcall at /usr/local/src/conda/python-3.10.13/Objects/classobject.c:53:18 24: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 25: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 26: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 27: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4213:19 28: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 29: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 30: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 31: do_call_core at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5945:12 32: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4277:22 33: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 34: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 35: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 36: do_call_core at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5945:12 37: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4277:22 38: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 39: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 40: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 41: _PyObject_FastCallDictTstate at /usr/local/src/conda/python-3.10.13/Objects/call.c:142:15 42: _PyObject_Call_Prepend at /usr/local/src/conda/python-3.10.13/Objects/call.c:431:24 43: slot_tp_call at /usr/local/src/conda/python-3.10.13/Objects/typeobject.c:7494:15 44: _PyObject_MakeTpCall at /usr/local/src/conda/python-3.10.13/Objects/call.c:215:18 45: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:112:16 46: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:99:1 47: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 48: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 49: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4213:19 50: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 51: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 52: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 53: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 54: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 55: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 56: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4198:23 57: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 58: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 59: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 60: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 61: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 62: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 63: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4198:23 64: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 65: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 66: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 67: _PyObject_FastCallDictTstate at /usr/local/src/conda/python-3.10.13/Objects/call.c:142:15 68: _PyObject_Call_Prepend at /usr/local/src/conda/python-3.10.13/Objects/call.c:431:24 69: slot_tp_call at /usr/local/src/conda/python-3.10.13/Objects/typeobject.c:7494:15 70: _PyObject_MakeTpCall at /usr/local/src/conda/python-3.10.13/Objects/call.c:215:18 71: PyObject_CallOneArg at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:184:12 72: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:2406:19 73: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 74: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 75: PyEval_EvalCode at /usr/local/src/conda/python-3.10.13/Python/ceval.c:1134:12 76: builtin_exec_impl at /usr/local/src/conda/python-3.10.13/Python/bltinmodule.c:1058:13 77: builtin_exec at /usr/local/src/conda/python-3.10.13/Python/clinic/bltinmodule.c.h:371:20 78: cfunction_vectorcall_FASTCALL at /usr/local/src/conda/python-3.10.13/Objects/methodobject.c:430:24 79: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 80: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 81: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 82: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4213:19 83: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 84: gen_send_ex2 at /usr/local/src/conda/python-3.10.13/Objects/genobject.c:213:14 85: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:2586:30 86: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 87: gen_send_ex2 at /usr/local/src/conda/python-3.10.13/Objects/genobject.c:213:14 88: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:2586:30 89: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 90: gen_send_ex2 at /usr/local/src/conda/python-3.10.13/Objects/genobject.c:213:14 91: gen_send_ex at /usr/local/src/conda/python-3.10.13/Objects/genobject.c:279:9 92: method_vectorcall_O at /usr/local/src/conda/python-3.10.13/Objects/descrobject.c:460:24 93: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 94: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 95: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 96: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4198:23 97: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 98: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 99: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 100: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 101: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 102: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 103: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4213:19 104: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 105: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 106: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 107: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 108: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 109: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 110: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4198:23 111: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 112: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 113: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 114: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 115: method_vectorcall at /usr/local/src/conda/python-3.10.13/Objects/classobject.c:53:18 116: PyVectorcall_Call at /usr/local/src/conda/python-3.10.13/Objects/call.c:267:24 117: _PyObject_Call at /usr/local/src/conda/python-3.10.13/Objects/call.c:290:16 118: PyObject_Call at /usr/local/src/conda/python-3.10.13/Objects/call.c:317:12 119: do_call_core at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5945:12 120: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4277:22 121: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 122: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 123: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 124: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 125: method_vectorcall at /usr/local/src/conda/python-3.10.13/Objects/classobject.c:53:18 126: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 127: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 128: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 129: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4231:19 130: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 131: gen_send_ex2 at /usr/local/src/conda/python-3.10.13/Objects/genobject.c:213:14 132: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:2586:30 133: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 134: gen_send_ex2 at /usr/local/src/conda/python-3.10.13/Objects/genobject.c:213:14 135: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:2586:30 136: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 137: gen_send_ex2 at /usr/local/src/conda/python-3.10.13/Objects/genobject.c:213:14 138: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:2586:30 139: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 140: gen_send_ex2 at /usr/local/src/conda/python-3.10.13/Objects/genobject.c:213:14 141: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:2586:30 142: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 143: gen_send_ex2 at /usr/local/src/conda/python-3.10.13/Objects/genobject.c:213:14 144: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:2586:30 145: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 146: gen_send_ex2 at /usr/local/src/conda/python-3.10.13/Objects/genobject.c:213:14 147: task_step_impl at /usr/local/src/conda/python-3.10.13/Modules/_asynciomodule.c:2653:22 148: task_step at /usr/local/src/conda/python-3.10.13/Modules/_asynciomodule.c:2950:11 149: cfunction_vectorcall_O at /usr/local/src/conda/python-3.10.13/Objects/methodobject.c:516:24 150: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 151: context_run at /usr/local/src/conda/python-3.10.13/Python/context.c:665 152: cfunction_vectorcall_FASTCALL_KEYWORDS at /usr/local/src/conda/python-3.10.13/Objects/methodobject.c:446:24 153: do_call_core at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5917:9 154: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4277:22 155: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 156: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 157: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 158: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 159: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 160: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 161: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4198:23 162: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 163: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 164: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 165: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 166: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 167: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 168: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4198:23 169: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 170: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 171: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 172: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 173: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 174: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 175: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4198:23 176: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 177: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 178: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 179: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 180: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 181: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 182: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4198:23 183: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 184: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 185: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 186: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 187: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 188: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 189: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4198:23 190: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 191: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 192: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 193: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 194: method_vectorcall at /usr/local/src/conda/python-3.10.13/Objects/classobject.c:53:18 195: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 196: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 197: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 198: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4181:23 199: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 200: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 201: PyEval_EvalCode at /usr/local/src/conda/python-3.10.13/Python/ceval.c:1134:12 202: builtin_exec_impl at /usr/local/src/conda/python-3.10.13/Python/bltinmodule.c:1058:13 203: builtin_exec at /usr/local/src/conda/python-3.10.13/Python/clinic/bltinmodule.c.h:371:20 204: cfunction_vectorcall_FASTCALL at /usr/local/src/conda/python-3.10.13/Objects/methodobject.c:430:24 205: _PyObject_VectorcallTstate at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11 206: PyObject_Vectorcall at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12 207: call_function at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13 208: _PyEval_EvalFrameDefault at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4213:19 209: _PyEval_EvalFrame at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12 210: _PyEval_Vector at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24 211: _PyFunction_Vectorcall at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. --------------------------------------------------------------------------- PanicException Traceback (most recent call last) Cell In[7], line 1 ----> 1 pl.scan_parquet(source='az://car-groupings/appr 2024-06-30.parquet', 2 storage_options={'account_name': 'cpsprodeaststorage'}) File /opt/miniconda/lib/python3.10/site-packages/IPython/core/displayhook.py:268, in DisplayHook.__call__(self, result) 266 self.start_displayhook() 267 self.write_output_prompt() --> 268 format_dict, md_dict = self.compute_format_data(result) 269 self.update_user_ns(result) 270 self.fill_exec_result(result) File /opt/miniconda/lib/python3.10/site-packages/IPython/core/displayhook.py:157, in DisplayHook.compute_format_data(self, result) 127 def compute_format_data(self, result): 128 """Compute format data of the object to be displayed. 129 130 The format data is a generalization of the :func:`repr` of an object. (...) 155 156 """ --> 157 return self.shell.display_formatter.format(result) File /opt/miniconda/lib/python3.10/site-packages/IPython/core/formatters.py:182, in DisplayFormatter.format(self, obj, include, exclude) 180 md = None 181 try: --> 182 data = formatter(obj) 183 except: 184 # FIXME: log the exception 185 raise File /opt/miniconda/lib/python3.10/site-packages/decorator.py:232, in decorate..fun(*args, **kw) 230 if not kwsyntax: 231 args, kw = fix(args, kw, sig) --> 232 return caller(func, *(extras + args), **kw) File /opt/miniconda/lib/python3.10/site-packages/IPython/core/formatters.py:226, in catch_format_error(method, self, *args, **kwargs) 224 """show traceback on failed format call""" 225 try: --> 226 r = method(self, *args, **kwargs) 227 except NotImplementedError: 228 # don't warn on NotImplementedErrors 229 return self._check_return(None, args[0]) File /opt/miniconda/lib/python3.10/site-packages/IPython/core/formatters.py:347, in BaseFormatter.__call__(self, obj) 345 method = get_real_method(obj, self.print_method) 346 if method is not None: --> 347 return method() 348 return None 349 else: File /opt/miniconda/lib/python3.10/site-packages/polars/lazyframe/frame.py:633, in LazyFrame._repr_html_(self) 631 def _repr_html_(self) -> str: 632 try: --> 633 dot = self._ldf.to_dot(optimized=False) 634 svg = subprocess.check_output( 635 ["dot", "-Nshape=box", "-Tsvg"], input=f"{dot}".encode() 636 ) 637 return ( 638 "

NAIVE QUERY PLAN

run LazyFrame.show_graph() to see" 639 f" the optimized version

{svg.decode()}" 640 ) PanicException: expected at least 1 path
Bidek56 commented 4 months ago

Have you tried removing the space from the path? source='az://car-groupings/appr 2024-06-30.parquet'

astrowonk commented 4 months ago

Have you tried removing the space from the path?

any container/blob combo with a scan parquet has this error, regardless of the blob name. (and blob names can have spaces…). (And the error doesn't happen in pre 1.0.0 releases)

astrowonk commented 4 months ago

Have you tried removing the space from the path? source='az://car-groupings/appr 2024-06-30.parquet'

Wait! I thought I had tested this and got the same error but I was using the wrong storage account.

1.0.0rc2 can handle blob names with spaces, and without spaces. 1.0.0 can handle blob names/az urls without spaces, but not with spaces. I saved two parquet blobs, one with a space in the name, one without.

az://my-blobs/test space.parquet fails in 1.0.0 (works with 1.0.0rc2) az://my-blobs/test-space.parquet works in 1.0.0 (and 1.0.0rc2)

@ritchie46