pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.56k stars 1.98k forks source link

ComputeError with S3 query when using Polars 1.14.0, works fine in 1.13.0. #19969

Open ghaffarialireza opened 2 days ago

ghaffarialireza commented 2 days ago

Checks

Reproducible example


pl.scan_delta(
    "s3://..",
    storage_options={
        "aws_default_region": aws_default_region,
        "aws_access_key_id": aws_access_key_id,
        "aws_secret_access_key": aws_secret_access_key,
    },
).filter(
    pl.col("date") >= pl.col("date").max().dt.offset_by("-2y"),
    pl.col("date") < pl.col("date").max().dt.offset_by("-3mo"),
).collect()

Log output

ComputeError: Generic S3 error: Client error with status 400 Bad Request: No Body

Issue description

This error occurs only in Polars version 1.14.0 when querying data stored in S3 using scan_delta. The same code works as expected in version 1.13.0. It seems related to a change in how Polars interacts with S3 in this version.

Expected behavior

The query should execute successfully and fetch data from the S3 bucket, as it does in Polars 1.13.0.

Installed versions

``` --------Version info--------- Polars: 1.14.0 Index type: UInt32 Platform: Linux-5.10.227-219.884.amzn2.x86_64-x86_64-with-glibc2.26 Python: 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:27:36) [GCC 13.3.0] LTS CPU: False ----Optional dependencies---- adbc_driver_manager 1.3.0 altair 5.5.0 boto3 1.35.68 cloudpickle 2.2.1 connectorx deltalake 0.22.0 fastexcel 0.12.0 fsspec 2024.10.0 gevent 24.11.1 google.auth great_tables matplotlib 3.9.2 nest_asyncio 1.6.0 numpy 1.26.4 openpyxl 3.1.5 pandas 2.2.3 pyarrow 18.0.0 pydantic 2.10.1 pyiceberg 0.8.0 sqlalchemy 2.0.36 torch xlsx2csv 0.8.4 xlsxwriter 3.2.0 ```
BartSchuurmans commented 1 day ago

Looking through the 1.14.0 changes, this one seems suspect: https://github.com/pola-rs/polars/pull/19103

Especially since I get the same error message from scan_parquet in this other issue: https://github.com/pola-rs/polars/issues/19933