Closed rwegener2 closed 8 months ago
:point_left: Launch a binder notebook on this branch for commit e3566f8765610b1f239ad46eb951d326c1666e07
I will automatically update this comment whenever this PR is modified
:point_left: Launch a binder notebook on this branch for commit 44fd8ccc2109b579a47cbff546eb41e7eaeca096
:point_left: Launch a binder notebook on this branch for commit 83d24fb3e40503c3d31f0429ebdc1cf018392fb0
:point_left: Launch a binder notebook on this branch for commit a187328d3ff2640ca59071f8fdacc002dd8ac6e4
:point_left: Launch a binder notebook on this branch for commit 593b9d19cc86052152c639929dae579991906a41
:point_left: Launch a binder notebook on this branch for commit 652a815fcb24b6414a20d377c567cdf26119d840
:point_left: Launch a binder notebook on this branch for commit 120694a11a62a93ffa335758016b46ec247daed4
:point_left: Launch a binder notebook on this branch for commit b4d59d6cd18b44247d22e2cc600a63785ca19824
:point_left: Launch a binder notebook on this branch for commit 3f4bfa7274b4ef1c7d686206b22f752f1e9745f9
:point_left: Launch a binder notebook on this branch for commit a29e7565489abe46baa059df7fc4451f5fea0bff
:point_left: Launch a binder notebook on this branch for commit 3f3cb1f61d601517147888c21bce953fe0479b30
:point_left: Launch a binder notebook on this branch for commit d0838c8c9528a23ceb3dee7900c97ee0887ee4f0
:point_left: Launch a binder notebook on this branch for commit 8127dbf9ae9790d93b95ff8e55603cf710bb6cbc
:point_left: Launch a binder notebook on this branch for commit b3341c12bad5846818256b67712a7c7677f7188a
:point_left: Launch a binder notebook on this branch for commit a184d9cd7bece9da303b3998d841bdcb641a5233
:point_left: Launch a binder notebook on this branch for commit 8ff1e70656d42f0a842ca85cfa119e819e0d236e
:point_left: Launch a binder notebook on this branch for commit 32175656a3ac4b9186a169b4a07b630423b16814
:point_left: Launch a binder notebook on this branch for commit c0e5f4e80e25183bc6e5962536207bcccb658544
:point_left: Launch a binder notebook on this branch for commit 5de9173ec4734adec57396768e588cf78f6ad2da
:point_left: Launch a binder notebook on this branch for commit b552808a02389817ae735286b07f471894cf63ab
:point_left: Launch a binder notebook on this branch for commit 37bcc96aa5943ab39cea3618e59b04bc5ff808e3
:point_left: Launch a binder notebook on this branch for commit afbec9802c639d80f9fd54f38a7acac0acda170b
:point_left: Launch a binder notebook on this branch for commit 5f7fc0aaa867de05d9e460b493f16eee61ec164b
:point_left: Launch a binder notebook on this branch for commit 9c4005a07fdbb5788b394762a824b6567a7f3dc0
:point_left: Launch a binder notebook on this branch for commit 170bd98a4f10b967a8bc0a218666f61e146d3f46
:point_left: Launch a binder notebook on this branch for commit ef26223e8577b7610b91cf9e602316762590b556
A visual guide I was using to think about the structures we have available to users
Outcomes of a discussion with @JessicaS11:
Do we think it is important to maintain backwards compatibility given that this is a class that has, up until now, been for internal use only?
DeprecationError
What are the most user-friendly input arguments?
We will have:
filepath
self.variables
is an attribute of a Query object, so implementing Query as an input arg would be circular.We won't aggregate these into a single data_source
argument. Instead make all arguments optional, but enforce that either a filepath
or a product
argument (but not both) is given.
Re: The docs
Plan is to use the existing page for nested variables, but restructure the page so it starts talking generally about the Variables class, then transitions to showing how you can also use the class directly from a Query object.
This PR is almost totally ready for review. I ran into one bug while running all the example notebooks that I'm looking for some help with.
In line 479 of variables.py
there is an if/else statement:
478 if self._vartype == "order":
479 nec_varlist = [
480 "sc_orient",
481 "sc_orient_time",
(...)
488 "end_delta_time",
489 ]
490 elif self._vartype == "file":
In this new iteration of Variables there is no more vartype
. If the vartype used to be order now Variables
just takes in the product and version and generates a variables list from that.
So my question is: what was the purpose of using different sets of nec_varlist
s based on the vartype
? I'm trying to figure out if this can be modified to run without knowing a vartype
. cc: @JessicaS11
If it is helpful, I found this while running the Building the default variables list section of the Subsetting notebook. The line:
region_a.order_vars.append(defaults=True)
pprint(region_a.order_vars.wanted)
Raised the error:
AttributeError Traceback (most recent call last)
Cell In[7], line 1
----> 1 region_a.order_vars.append(defaults=True)
2 pprint(region_a.order_vars.wanted)
File ~/icepyx/icepyx/core/variables.py:478, in Variables.append(self, defaults, var_list, beam_list, keyword_list)
475 self._check_valid_lists(vgrp, allpaths, var_list, beam_list, keyword_list)
477 # add the mandatory variables to the data object
478 if self._vartype == "order":
479 nec_varlist = [
480 "sc_orient",
481 "sc_orient_time",
(...)
488 "end_delta_time",
489 ]
490 elif self._vartype == "file":
AttributeError: 'Variables' object has no attribute '_vartype'
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
So my question is: what was the purpose of using different sets of
nec_varlist
s based on thevartype
?
Fundamentally, the Variables
instance from within query
is of vartype "order", while the Variables
instance from read
is of vartype "file". For users using variable subsetting to order data, we wanted to make sure that they got back some minimal set of variables they'd need (or appreciate having) to work with the data (but wouldn't necessarily order on their own initially) without needing to place another data order. This was likely more important when the data was just coming out and there was no a priori knowledge for most lab groups on what variables/groups/paths they might want. From the read side, there are a few pieces of information we need to be able to index/manipulate/merge across granules, so this allowed us to mandate they're included.
(As an aside: The non-implemented interactions of a query.variables
instantiation with a type "file" confuses this, and if memory serves were the start of making it possible for the user to create a query object by supplying an ICESat-2 granule (i.e. file), I think with the goal that they could then still use the variables module to explore the variables in the file. It's probably worth my trying to find some more background on this idea (EDIT: #34) and, I'm guessing, removing this non-implemented code from query to clean things up.)
I'm trying to figure out if this can be modified to run without knowing a
vartype
I think the short answer is "yes". From the read
end, we can pretty easily add our own list with an .append(vars_we_want)
call before we start loading. For the query
(so on-prem subsetting) usage, the user directly constructs their vars.wanted
and supplies it via the coverage
kwarg to query.order(coverage=[])
or query.download(coverage=[])
, so we'd have to somehow make sure the defaults were included when a user does variable subsetting.
I suppose one alternative would be to take the minimal list required for the read.load
application and use that for all cases (thus removing a few of the vartype="order"
defaults). I think I would prefer the per-variables-object model described above though because I suspect that will get us closer to generating (for instance) Andy's hackweek list of variables and ultimately leaves things more flexible.
Alright, I've marked this are "Ready for Review", although it isn't 100% complete. I'm getting a bit turned around about how to remove the nec_varlist
from Variables. I think I did it fine for the Read class, but I'm getting lost tracing the Coverage
parameter for the Query class. Instead of surging ahead on Query I thought it would be worth it to request review for the current state, and then have a conversation to outline how to update the Query class.
Okay, this is ready for review! I've been updating the PR description so it is up-to-date with test examples and description.Two comments:
cycle_number
and rgt
, but Query didn't require those. That felt odd, because I couldn't read a file with icepyx that I downloaded with icepyx, so I added those two variables to the Query required var_list
. If there is a reason that isn't a good idea let me know and I can take them out.append
method. I didn't understand 100% of the parts I was modifying, so just requesting an extra look at that part.Looking forward to the review!
P.S. I would like everyone to know that I did a global search for "depreciate" before committing. Look how far I've come 😛
Goal
The ability to list the available variables and their h5 paths is a unique and powerful feature of icepyx. This PR aims to make that first-order functionality, instead of functionality that is only used internally. I think this will be a useful user feature, and I think it makes sense to make this change before adding the s3 url capability.
How it was done
We anticipate that the most common use case is reading variables from a file. Alternately, all you need to produce an
.avail()
list of variables is a product and version. This information is used to ping an API, which returns the variables.When using
Variables()
the user is required to supply EITHER:The version argument is optional. If it is not supplied the latest version is assumed. If both
path
andproduct
is given an error is raised.How it can be tested (updated)
And testing that Read and Query still work:
Notes
var_list
is appended to the wanted variables list (vars.wanted
) duringload
. Ifvars.wanted
does not exist an error is raised instructing the user to add variables before loading the dataset.Coverage
kwarg intoorder_granules
(Commonly:region_a.order_granules(Coverage=region_a.order_vars.wanted)
) Required variables will get appended toCoverage
in thesubsetparams
method ofQuery
.Open questions (Now resolved)
UnboundLocalError: local variable 'groups_list' referenced before assignment
). Now the code loads a dataset object that is basically empty. Do we want this? --> We will raise a new descriptive error if the user has not added any variables before reading_