Problem with reading notebooks directly from S3 Buckets

nteract / scrapbook

A library for recording and reading data in notebooks.

https://nteract-scrapbook.readthedocs.io

BSD 3-Clause "New" or "Revised" License

281 stars 26 forks source link

Problem with reading notebooks directly from S3 Buckets #52

Closed victorluizgomes closed 4 years ago

victorluizgomes commented 5 years ago

I am trying to read notebooks in a s3 bucket that is located in the non-AWS S3 compatible object store using:

import scrapbook as sb
book = sb.read_notebooks('s3://my-bucket/')

But I get an error saying my Access Key Id does not exist:

ClientError: An error occurred (InvalidAccessKeyId) when calling the ListObjectsV2 operation: The AWS Access Key Id you provided does not exist in our records.

I can access the bucket contents from boto3 successfully.

Am I missing anything here in the configuration with respect to endpoint url?

MSeal commented 5 years ago

We use boto3 under the hood as well. It's using boto3.session.Session from papermill's iorw.s3.py so it should behave identically. The error listed suggests it found a key but it was invalid. Did you have a temporary key that expired?

victorluizgomes commented 5 years ago

Ok, no the key is still valid.

I can access my bucket with the key from the command line using aws s3 ls, but my worry is that when I am accessing my bucket I use the option --endpoint-url to be able to access it, which is located in a non-AWS S3 compatible object store.

I can also access my bucket when using boto3 by overwriting the endpoint url on the boto3.client myself, with:

s3 = boto3.client('s3', endpoint_url='https://my-url.com/')

But I am not able to access the boto3.client when calling sb.read_notebooks() since it uses boto3 under the hood.

Maybe I am thinking of this the wrong way? or is there anyway I can access Papermill and change the default endpoint-url to not go to AWS before calling sb.read_notebooks()?

Thank you.

MSeal commented 5 years ago

Ahh yes overwriting the URL requires you register a diffrrent handler for S3 schemas. While scrapbook is still using papermill io for resolving these uris you can use https://papermill.readthedocs.io/en/latest/extending-overview.html to extend an overwrite for S3 paths to set or pass the options you wish to set for your setup.