icesat2py / icepyx

Python tools for obtaining and working with ICESat-2 data
https://icepyx.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
203 stars 101 forks source link

WIP expand Variables class to handle s3 urls from NSIDC #434

Closed rwegener2 closed 9 months ago

rwegener2 commented 12 months ago

What was done

At the moment the Variables class .avail() method would fail if given an s3 url. This PR expands the class to handle this case.

How it was done

In addition to file and order a new vartype, nsidc-s3, was added to the class. nsidc was specified (instead of just s3) because extracting the product and version from the filepath relies on the nsidc naming structure. So, you couldn't just give Variables the s3 path to any IS2 file anywhere on AWS; it needs to be from the nsidc-cumulus-prod-protected bucket. If others have feedback on this decision I'm happy to hear. We could alternately try to extract the product and version from the metadata, which would remove this limitation.

Within the .avail() method the nsidc-s3 filetype uses the same variable extraction strategy as order. An interesting note about this is that if we use the order method of grabbing variables it happens very quickly. It's the time to ping an API endpoint and parse the request. The .avail() function gets very slow with cloud data when trying to walk the full file (as happens when using the file argument), but we can avoid that by using the order method of extracting variables instead.

Todo / Next steps

This PR is pretty close, but blocked by determining how to pass authentication (#433). After auth is addressed this PR can be merged into the getvars branch, where work has already started changing the way the Variables class is called in Read and Query.

github-actions[bot] commented 12 months ago

Binder :point_left: Launch a binder notebook on this branch for commit 703d832f0803d1b94790e53a32455d90d0494f1c

I will automatically update this comment whenever this PR is modified

Binder :point_left: Launch a binder notebook on this branch for commit 61b20069cdc3e836f444c4cd556c0c2638eaaf2b

Binder :point_left: Launch a binder notebook on this branch for commit ba52c55ece92d15060b2c138d863e3c0b9ee5e9f

rwegener2 commented 9 months ago

This PR prompted two prior PRs: #444 and #451. I think those updates will are exciting for enabling the goal of this PR, allowing the Variables class to list s3 data variables. The changes in those two prior have made the approach to this goal is quite different than when this PR was opened (it was more complicated back then!). As a result I'm going to close this PR and open a new one to pursue adding s3 url reads to the Variables class.