Closed robertdj closed 1 month ago
It turns out that I can make this work if I use PyArrow:
import pyarrow.dataset as ds
import pyarrow.fs as fs
pyfs = fs.S3FileSystem(endpoint_override="https://fra1.digitaloceanspaces.com")
pyds = ds.dataset(source="mybucket/test.parquet", filesystem=pyfs, format="parquet")
df = pl.scan_pyarrow_dataset(pyds).collect()
But it would be nice if it worked directly with Polars :-)
There should be aws_endpoint_url
key in storage_options with your custom endpoint. Works for me.
Works great, thanks!
Description
I'm trying to use Polars to access a parquet file stored in DigitalOcean Spaces, that is a S3 compatible storage. It works with the boto3 package, but I can't make it work with Polars.
I have set
access_key_id
andsecret_access_key
in~/.aws/credentials
. I can list contents in the bucket with boto3.Note that the
endpoint_url
is specified.In the Spaces I have a bucket called
mybucket
containing a file calledtest.parquet
. (Apparently theaws_region
should be fixed tous-east-1
for DigitalOcean.)I get an error
If I specify the bucket more elaborately to be
I get a different error suggesting that the endpoint is hard coded to
s3.amazonaws.com
.