nsidc / earthaccess

Python Library for NASA Earthdata APIs
https://earthaccess.readthedocs.io/
MIT License
416 stars 84 forks source link

`UnboundLocalError` in `earthaccess.get_s3fs_session` #248

Closed jrbourbeau closed 1 year ago

jrbourbeau commented 1 year ago

Using the latest earthaccess=0.5.2 release

fs = earthaccess.get_s3fs_session()

raised the following error

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
Cell In[48], line 1
----> 1 fs = earthaccess.get_s3fs_session()
      2 fs

File ~/mambaforge/envs/earthaccess/lib/python3.9/site-packages/earthaccess/api.py:289, in get_s3fs_session(daac, provider)
    281 def get_s3fs_session(
    282     daac: Optional[str] = None, provider: Optional[str] = None
    283 ) -> s3fs.S3FileSystem:
    284     """Returns a fsspec s3fs file session for direct access when we are in us-west-2
    285 
    286     Returns:
    287         class s3fs.S3FileSystem: an authenticated s3fs session valid for 1 hour
    288     """
--> 289     session = earthaccess.__store__.get_s3fs_session(daac=daac, provider=provider)
    290     return session

File ~/mambaforge/envs/earthaccess/lib/python3.9/site-packages/earthaccess/store.py:142, in Store.get_s3fs_session(self, daac, concept_id, provider)
    137 delta_minutes = now - self.initial_ts
    138 # TODO: test this mocking the time or use https://github.com/dbader/schedule
    139 # if we exceed 1 hour
    140 if (
    141     self.s3_fs is None or round(delta_minutes.seconds / 60, 2) > 59
--> 142 ) and s3_credentials is not None:
    143     self.s3_fs = s3fs.S3FileSystem(
    144         key=s3_credentials["accessKeyId"],
    145         secret=s3_credentials["secretAccessKey"],
    146         token=s3_credentials["sessionToken"],
    147     )
    148     self.initial_ts = datetime.datetime.now()

UnboundLocalError: local variable 's3_credentials' referenced before assignment

Because the s3_credentials variable may not be defined if it doesn't fall into one of the if/elif blocks:

https://github.com/nsidc/earthaccess/blob/54b688b906776f5c845483dd00676f6c681feb10/earthaccess/store.py#L129-L135

I'm not sure if we can still return an s3fs instance in this case, or if raising an informative error is the correct approach

betolink commented 1 year ago

Hi @jrbourbeau this error (earthaccess should have a better error handling) is because get_s3fs_session() requires either a DAAC or the DAAC provider as each NASA DAAC distribute their data under different AWS IAM roles and each DAAC uses their own AWS temporary credentials endpoint. If we were accessing HLS data from LPDAAC we would need

fs = earthaccess.get_s3fs_session("LPDAAC")
# fs can now open/copy data from the LPDAAC S3 buckets (for an hour)

Note: S3 (direct access) can only be used from us-west-2

If you're using earthaccess to search for data, earthaccess will use the returned metadata from CMR(NASA's search API) to select the proper set of credentials automatically.

results = earthaccess.search_data(
  short_name="HLSS30",
  bounding_box=(55.0,68.0,-48.0,71.0)
)
file_handlers = earthaccess.open(results)

if we execute this in the cloud, file_handlers will be a list of s3fs file instances initialized with the AWS temporary credentials for LPDAAC (the NASA data center that owns the S3 bucket for this dataset)

jrbourbeau commented 1 year ago

Okay, thanks for the explanation @betolink. I pushed a PR here https://github.com/nsidc/earthaccess/pull/249 that raises a more informative error for this case