Closed rwegener2 closed 9 months ago
:point_left: Launch a binder notebook on this branch for commit 703d832f0803d1b94790e53a32455d90d0494f1c
I will automatically update this comment whenever this PR is modified
:point_left: Launch a binder notebook on this branch for commit 61b20069cdc3e836f444c4cd556c0c2638eaaf2b
:point_left: Launch a binder notebook on this branch for commit ba52c55ece92d15060b2c138d863e3c0b9ee5e9f
This PR prompted two prior PRs: #444 and #451. I think those updates will are exciting for enabling the goal of this PR, allowing the Variables class to list s3 data variables. The changes in those two prior have made the approach to this goal is quite different than when this PR was opened (it was more complicated back then!). As a result I'm going to close this PR and open a new one to pursue adding s3 url reads to the Variables class.
What was done
At the moment the
Variables
class.avail()
method would fail if given an s3 url. This PR expands the class to handle this case.How it was done
In addition to
file
andorder
a newvartype
,nsidc-s3
, was added to the class.nsidc
was specified (instead of justs3
) because extracting the product and version from the filepath relies on the nsidc naming structure. So, you couldn't just giveVariables
the s3 path to any IS2 file anywhere on AWS; it needs to be from thensidc-cumulus-prod-protected
bucket. If others have feedback on this decision I'm happy to hear. We could alternately try to extract the product and version from the metadata, which would remove this limitation.Within the
.avail()
method thensidc-s3
filetype uses the same variable extraction strategy asorder
. An interesting note about this is that if we use theorder
method of grabbing variables it happens very quickly. It's the time to ping an API endpoint and parse the request. The.avail()
function gets very slow with cloud data when trying to walk the full file (as happens when using thefile
argument), but we can avoid that by using theorder
method of extracting variables instead.Todo / Next steps
This PR is pretty close, but blocked by determining how to pass authentication (#433). After auth is addressed this PR can be merged into the
getvars
branch, where work has already started changing the way theVariables
class is called inRead
andQuery
.parse
as a library requirement