microsoft / PlanetaryComputer

Issues, discussions, and information about the Microsoft Planetary Computer
https://planetarycomputer.microsoft.com/
MIT License
185 stars 9 forks source link

NSRDB example notebook fails on Hub (requires auth) #180

Closed gjoseph92 closed 6 months ago

gjoseph92 commented 1 year ago

When trying to run the National Solar Radiation Database example notebook on the PC hub, I get a couple errors:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In [22], line 7
      4 from adlfs import AzureBlobFileSystem
      6 # Not used directly, but used by xarray to read .h5 files
----> 7 import h5pyd, h5netcdf
      9 # Year to investigate and plot
     10 year = 2015

ModuleNotFoundError: No module named 'h5pyd'

Fine, let's see if we can just list the blobs at least.

fs = AzureBlobFileSystem(account_name=storage_account_name)
annual_files = fs.glob(folder + '/*.h5')
print('Found {} annual files:'.format(len(annual_files)))
for k in range(0,10):
    print(annual_files[k])
print('...')
ClientAuthenticationError: Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
RequestId:88a674e3-101e-001a-2e82-39eeab000000
Time:2023-02-05T16:54:22.8859881Z
ErrorCode:NoAuthenticationInformation
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>NoAuthenticationInformation</Code><Message>Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
RequestId:88a674e3-101e-001a-2e82-39eeab000000
Time:2023-02-05T16:54:22.8859881Z</Message></Error>

So some form of auth (which isn't automatically set up for me on the Hub) is required to access the bucket?

FWIW, trying this on a whim didn't work:

pc.sas.get_token("nrel", "nrel-nsrdb")

HTTPError: 404 Client Error: Not Found for url: https://planetarycomputer.microsoft.com/api/sas/v1/token/nrel/nrel-nsrdb

Also, following the link in the "Mounting the Container" section (which seems to include a pre-generated SAS token?) gives

<Error>
<Code>AuthenticationFailed</Code>
<Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:a6a276ee-701e-006e-1783-39da5b000000 Time:2023-02-05T17:03:43.1837858Z</Message>
<AuthenticationErrorDetail>Signature did not match. String to sign used was /blob/nrel/$root nrel-nsrdb-ro 2020-08-04 c </AuthenticationErrorDetail>
</Error>
TomAugspurger commented 1 year ago

Thanks for the report. That dataset predates our SAS API, hence the pre-generated SAS token (which has perhaps expired).

I added that storage container to our SAS endpoint, so https://planetarycomputer.microsoft.com/api/sas/v1/token/nrel/nrel-nsrdb should work now (there might be some caching, so in ~5-10 minutes).

Just an FYI, since you mentioned the Hub, that dataset is in the East US region (the hub is in West Europe).

I think that xr.open_dataset(fsspec.open(url).open()) should work for reading the data over the network without h5pyd.

gjoseph92 commented 1 year ago

Thanks @TomAugspurger. With the SAS fix, this seems to work on the Hub:

import xarray as xr
import pandas as pd
import planetary_computer as pc
from adlfs import AzureBlobFileSystem

# Storage resources
storage_account = 'nrel'
container = 'nrel-nsrdb'

token = pc.sas.get_token(storage_account, container)

fs = AzureBlobFileSystem(storage_account, sas_token=token.token)
annual_files = fs.glob(container + '/v3/*.h5')
print('Found {} annual files:'.format(len(annual_files)))
for k in range(0,10):
    print(annual_files[k])
print('...')

f = fs.open(annual_files[-1])
ds = xr.open_dataset(f, chunks="auto", phony_dims='sort')

ds.air_temperature[0, 0].compute()

Might be worth updating the notebook to mention this. (Also looks this only goes up to 2020, so maybe this isn't actively maintained in the first place.)

TomAugspurger commented 1 year ago

I'll get the notebook updated, and I'm checking on the status of newer data.

TomAugspurger commented 1 year ago

https://github.com/microsoft/AIforEarthDataSets/pull/38 updated the notebook. I think nbviewer caches stuff for a while, but it'll eventually be updated there as well.

TomAugspurger commented 6 months ago

Fixed by https://github.com/microsoft/AIforEarthDataSets/pull/38