ukaea / fair-mast

A data management system for Tokamak data
MIT License
4 stars 2 forks source link

Credentials error raised when trying to read data #74

Closed Simon-McIntosh closed 1 month ago

Simon-McIntosh commented 1 month ago

A credentials error is raised when trying to access a dataset via the to_dask method.

import appdirs
import intake

catalog = intake.open_catalog("https://mastapp.site/intake/catalog.yml")
url = "s3://mast/level1/shots/30467.zarr/amc"
dataset = catalog.level1.sources(
    url=url, storage_options={"cache_storage": appdirs.user_cache_dir()}
)
dataset.to_dask()

...

File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\aiobotocore\signers.py:24 in handler return await self.sign(operation_name, request)

File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\aiobotocore\signers.py:90 in sign auth.add_auth(request)

File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\botocore\auth.py:423 in add_auth raise NoCredentialsError()

NoCredentialsError: Unable to locate credentials

Simon-McIntosh commented 1 month ago

setting anon=True seems to fix the NoCredentialsError. I now get a KeyError: '.zmetadata'

dataset = catalog.level1.sources(
    url=url, storage_options={"cache_storage": appdirs.user_cache_dir(), "s3": {"anon": True}}
)
dataset.to_dask()

...

File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\zarr\convenience.py:1360 in open_consolidated meta_store = ConsolidatedStoreClass(store, metadata_key=metadata_key)

File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\zarr\storage.py:3046 in init meta = json_loads(self.store[metadata_key])

File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\zarr\storage.py:1448 in getitem raise KeyError(key) from e

KeyError: '.zmetadata'

samueljackson92 commented 1 month ago

Hi @Simon-McIntosh

NoCredentialsError: Unable to locate credentials

By default s3fs expects an access key & secret. But we explicitly set anon=True in the intake catalog. When you're adjusting the temporary directory, you're overwriting the defaults, hence the permissions error.

KeyError: '.zmetadata'

The cause of this is similar to the first issue. We set the endpoint_url of our storage in the catalog by default. Adding that back into the options you're overriding works for me:

dataset = catalog.level1.sources(
    url=url, storage_options={"cache_storage": "/tmp", "s3": {"anon": True, 'endpoint_url': "https://s3.echo.stfc.ac.uk"}}
)
dataset.to_dask()

We'll get the temporary path fixed to something more sensible. This is a nice example of how thin access layers let you work around a bug...

Simon-McIntosh commented 1 month ago

Thanks, this fixes it.