pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.64k stars 1.99k forks source link

Cannot request minio as s3 endpoint: "The request signature we calculated does not match the signature you provided" #18405

Open Kuinox opened 3 months ago

Kuinox commented 3 months ago

Checks

Reproducible example

With a minio instance running locally

import polars as pl

storage_options = {
    "aws_access_key_id": "redacted",
    "aws_session_token": "redacted",
    "aws_region": "eu-west-3",
    "endpoint_url": "http://localhost:9000"
}
pl.scan_parquet("s3://foobar/**/*.parquet",storage_options=storage_options).collect()

Log output

Traceback (most recent call last):
  File "c:\Dev\polars_experiments\polars_error.py", line 10, in <module>
    pl.scan_parquet("s3://foobar/**/*.parquet",storage_options=storage_options).collect()
  File "C:\Users\n.vandeginste\AppData\Roaming\Python\Python311\site-packages\polars\lazyframe\frame.py", line 2027, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: Generic S3 error: Error performing list request: Client error with status 403 Forbidden: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message><BucketName>foobar</BucketName><Resource>/foobar</Resource><RequestId>17EFA18B16CC07DC</RequestId><HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId></Error>

Issue description

When requesting a local S3 endpoint using minio, the request is rejected. minio is running locally without docker.

Expected behavior

The query works.

Installed versions

``` --------Version info--------- Polars: 1.5.0 Index type: UInt32 Platform: Windows-10-10.0.22631-SP0 Python: 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: gevent: great_tables: hvplot: matplotlib: 3.8.4 nest_asyncio: 1.6.0 numpy: 1.26.4 openpyxl: pandas: 2.2.2 pyarrow: 17.0.0 pydantic: pyiceberg: sqlalchemy: torch: xlsx2csv: xlsxwriter: ```
Kuinox commented 3 months ago

@ritchie46 sorry for the ping but I believe this is phishing/malware in the last comment.

Edit: it looks like the comment is now gone.

Bidek56 commented 2 months ago

This works fine for me with a local quay.io/minio/minio container.

storage_options={"aws_access_key_id": "accesskey",
    "aws_secret_access_key": "secretkey",
    "endpoint_url": "http://localhost:9000"
}

s3url = "s3://test-bucket/dates.parquet"

df = pl.scan_parquet(s3url, storage_options=storage_options).collect()
print(f"{df.describe()=}")
Kuinox commented 2 months ago

I used the binaries directly on windows, do you want a captured packet or something like that ?