Closed rwegener2 closed 1 year ago
:point_left: Launch a binder notebook on this branch for commit 4564b3b7fa0e0c762a51ead546dd68fac24e7b2d
I will automatically update this comment whenever this PR is modified
:point_left: Launch a binder notebook on this branch for commit 16b8d3f880a07b5c1b3bf2d6fd50176753d9bdf7
:point_left: Launch a binder notebook on this branch for commit 083426c0ab152c8034b85c60ccf6a8e5f6ef7419
:point_left: Launch a binder notebook on this branch for commit edfa36247765d0a862f2906ade4c2751843554bc
:point_left: Launch a binder notebook on this branch for commit 72c53473c527eb594c337e6de03ccef777dc4017
:point_left: Launch a binder notebook on this branch for commit 434bbf24d61305a50dcf2bd67e32231d58f2d0ac
:point_left: Launch a binder notebook on this branch for commit e06765acd385cd2400bec5c4b74ab4f4e5fb3276
:point_left: Launch a binder notebook on this branch for commit 9e1f74551e1f6ecfd602e0603019912b08ccd02b
:point_left: Launch a binder notebook on this branch for commit a2a455fb698055b17548d6063e4ffccd9c8f28c3
:point_left: Launch a binder notebook on this branch for commit 1aacb1dcf33722779b926e27f1748ccca911c171
:point_left: Launch a binder notebook on this branch for commit 50db05e4e7935cc0f82619251fd131e9c2d9c3f6
:point_left: Launch a binder notebook on this branch for commit 12e21ba0018887c5b36e3a92fe416ade157048b0
:point_left: Launch a binder notebook on this branch for commit 10bc7347a5e85a9e391a380278abfffa19cac0a2
:point_left: Launch a binder notebook on this branch for commit 8033ed4fccdd521529f337779c402620bb0448c7
:point_left: Launch a binder notebook on this branch for commit 6cdddbfced5735984e748748d0dadc9daeafc615
:point_left: Launch a binder notebook on this branch for commit b7b8b7e007162702c325803577f266021bb2407f
:point_left: Launch a binder notebook on this branch for commit 1172e9ed3aa2c00278fc047ef419d63d18ce3c19
:point_left: Launch a binder notebook on this branch for commit abd950a97b6eca5f0d4fc80db9a01a1f84cd4634
:point_left: Launch a binder notebook on this branch for commit 90354e64a65b57acad88373ecee75264a00a474b
:point_left: Launch a binder notebook on this branch for commit 7fedc2dd4ca3c28381f76378241daf98fdaf053f
:point_left: Launch a binder notebook on this branch for commit e26c9a13c6c6d024968b1dc060742451ccfb5ca3
:point_left: Launch a binder notebook on this branch for commit a31b092493e0c4be5354f25b0bf8026feb6b6f8f
:point_left: Launch a binder notebook on this branch for commit 3aab79a492d45ed333aa84e19908c90f3d960dc3
:point_left: Launch a binder notebook on this branch for commit ade942fde832d5d8fde0ca9c82fc4c2d6a7e4b35
:point_left: Launch a binder notebook on this branch for commit f052b7632284621f4dfb9eade8bf2c5d03214673
:point_left: Launch a binder notebook on this branch for commit 63b0275aa3de3e55785b426718beed542104ae39
:point_left: Launch a binder notebook on this branch for commit 1796a2c0427cce7606c869542bbd4a14d154d431
:point_left: Launch a binder notebook on this branch for commit 4d7687b4f5035378a09bcb8da6662a89ce3140a1
:point_left: Launch a binder notebook on this branch for commit 38380443c5bd37c317f91cf065bc7b16b04fe843
:point_left: Launch a binder notebook on this branch for commit 693a9fa31f8454f5ee57709ee6b67ab6d17f4908
@betolink I'm curious if you have any thoughts on this approach for centralizing auth in icepyx so that it can easily be picked up across Query, Variables, and Read objects. Pulling out some relevant points from @rwegener2's great writeup above:
we should just try to reference the auth object that comes back from earthaccess.login(). (Ex. by setting a self.auth = earthaccess.login() variable on all the classes that require authentication
but...
the library would try to regenerate the objects very time self.auth.get_s3_credentials(daac="NSIDC") or self.auth.get_session() was accessed. For session that wouldn't matter so much because that is fast to access, but the s3 creds takes ~5 seconds to retrieve. This seemed like it might be a bit cumbersome over time.
She developed this Mixin approach, but we're still left with (assuming this isn't still outstanding in earthaccess
too)
how can we allow the user to regenerate new s3 creds after an hour has passed? Right now there isn't a way to do that.
Do you have any thoughts?
Hi @JessicaS11 @rwegener2 this is still an outstanding issue with earthaccess
the library tries to optimize and cache the s3fs
session but we haven't implemented anything like this at the get_s3_credentials()
level yet.
For icepyx, this mixing class looks good! maybe when we instantiate the class for the first time we could store the timestamp and return the same cached set of credentials until the time delta goes above say 50 minutes, a little like the get_s3fs_session()
method.
Thanks for the feedback @betolink!
I've cleaned up the PR and added:
earthaccess.login()
succeeded, and raises an error if it did notThe one question I had for @betolink:
earthaccess.login()
return the error message anywhere other than printing it to output? If it does I can pass that along to the user, but if not I think printed output is fine.We can talk about if it makes sense to merge this / how the best way to that.
UPDATE: Just adding that after a conversation with Luis it sounds like earthaccess does not currently propagate the error message back to the user. I think it's fine for icepyx to raise the error and direct users to view the earthaccess output. No further action needed here.
In writing those tests I did also end up changing a few lines of test_behind_NSIDC_API_login.py
. There was a fixture in that class that writes a .netrc file. I moved that block to the pytest configuration so that the .netrc gets written before all the tests and can also be used by the new authentication class.
I’m going to pause on this PR for the rest of the day, but the last thing I’ve got on the todo list is updating the documentation to reflect that the user is no longer required to explicitly login. After that I'll re-request review.
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Ok, I think this PR is basically ready for a full review! I have just one doubt:
EarthdataAuthMixin
. I see the query.rst
, read.rst
, etc. files, but how could I create one of those for the new class? Are they programmatically generated, or written out by hand?I've still gone ahead and re-requested review; I'll get the API reference pushed as soon as I hear back. Thank you!
- how could I create one of those for the new class? Are they programmatically generated, or written out by hand?
Somewhere in between, unfortunately. I think the most appropriate spot for this is probably as one of the components
, so we can just copy-paste a new block into that .rst file. If we now need to access auth info via query_object.auth
, we should probably include a pointer function to get users from query_object.earthdata_login
to query_object.auth.earthdata_login
. In that function our docstring should note the deprecation of the function so users see that in the API docs.
Okay @JessicaS11, I think this is ready for another check. I left comments next to yours, but to summarize:
.session
syntax. This is how the user would in theory access them, if needed (although we hope they never actually need to). Using the public properties instead of the private ones is what enables the auto-checking to see if a session has already been created, and what creates a session if there isn't one already. Going forward I don't expect anything other than the EarthdataAuthMixin to access the private properties.self.session
syntax is sufficient. self.auth.session
isn't needed. Let me know if either of the last two bullets don't make sense.we should probably include a pointer function to get users from
query_object.earthdata_login
toquery_object.auth.earthdata_login
. In that function our docstring should note the deprecation of the function so users see that in the API docs.
I left the Query.earthdata_login
method in the documentation, which has the note about being depreciated in it.
@rwegener2 I'm running the basic data access notebook locally and... it fails. I've got valid authentication (it did show an error before bringing up the manual login box, but it did let me authenticate that way and I got valid credentials), but cannot actually do anything that's behind a login. The two failure cases I found are: region_a.order_vars.avail()
and region_a.order_granules()
, both of which are getting hung up in the requests stage.
UPDATE: I should have known better than to try and test icepyx code on a Wednesday. NSIDC system is down for maintenance (see banner at: https://nsidc.org/data/icesat-2). Wednesday is often their maintenance day...
I'm running the basic data access notebook locally and... it fails.
Ok, I'm trying it out now. @JessicaS11
I also get a failure, but mine is happening on
#search for available granules and provide basic summary info about them
region_a.avail_granules()
due to AssertionError: Your search returned no results; try different search parameters
. Are you changing something about the default search parameters to locate more data? I just opened IS2_data_access.ipynb
and did Restart and Run All Cells. I wouldn't think that would be an error with this code, but if it's not expected I can dig into it.
I wouldn't think that would be an error with this code, but if it's not expected I can dig into it.
There are a few examples in there that are set up to fail to demo things for the user. With Run All Cells you're likely hitting up against one of those.
I just:
s3token
parameter in the earthdata_login
function to None
Let me know if anything comes up while running the code @JessicaS11 🤞🏻
Yay, soooo exiting! Thanks for all of your attentive review @JessicaS11! Merging now! 🎊
What is done
This PR refactors Earthdata authentication to make it more extensible to additional classes that may want to authenticate. Related discussion #433
How it was done
Here I'm continuing the discussion from #433. @JessicaS11 made the good point that instead of creating our own EarthdataAuth class we should just try to reference the
auth
object that comes back fromearthaccess.login()
. (Ex. by setting aself.auth = earthaccess.login()
variable on all the classes that require authentication. Generally like this:I experimented with this but eventually realized the library would try to regenerate the objects very time
self.auth.get_s3_credentials(daac="NSIDC")
orself.auth.get_session()
was accessed. Forsession
that wouldn't matter so much because that is fast to access, but the s3 creds takes ~5 seconds to retrieve. This seemed like it might be a bit cumbersome over time. We could do separate attributes (self.session
orself.s3login_credentials
) and a@property
getter method to get around this, but that was starting to feel like a lot of code to copy and paste into every class that needed auth. The other major problem is that I don't think we want to make users login when the Query object is created. I think we'd like to let them run Queries and search for data, and only require login once the user tries to.order_granules()
.To avoid these challenges I created a mixin class. This differs from an inherited class in that it is built to extend the functionality of other classes with a small, specific set of attributes or functions. The
EarthdataAuthMixin
contains properties forauth
,session
, ands3login_credentials
. These set themselves (login) the first time they are accessed, and reference the previously created object in subsequent accesses. A bonus of this mixin method is that I was able to keep the.earthdata_login()
method, which should make it backwards compatible. The mixin could be added to any class that needs authentication and the user doesn't need to keep track of if s3 credentials or a regular session are needed.I'm curious what others think of this technique. It's a half step more complicated than the
self.auth=earthaccess.login()
technique. It felt to me like we gained in modularity and functionality enough to warrant the complication, but I'd love additional opinions.How it can be tested
Both the following code snippets should work:
Todo