icesat2py / icepyx

Python tools for obtaining and working with ICESat-2 data
https://icepyx.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
210 stars 107 forks source link

Refactor authentication #435

Closed rwegener2 closed 1 year ago

rwegener2 commented 1 year ago

What is done

This PR refactors Earthdata authentication to make it more extensible to additional classes that may want to authenticate. Related discussion #433

How it was done

Here I'm continuing the discussion from #433. @JessicaS11 made the good point that instead of creating our own EarthdataAuth class we should just try to reference the auth object that comes back from earthaccess.login(). (Ex. by setting a self.auth = earthaccess.login() variable on all the classes that require authentication. Generally like this:

def Query():
    def __init__():
        self.auth = earthaccess.login()

I experimented with this but eventually realized the library would try to regenerate the objects very time self.auth.get_s3_credentials(daac="NSIDC") or self.auth.get_session() was accessed. For session that wouldn't matter so much because that is fast to access, but the s3 creds takes ~5 seconds to retrieve. This seemed like it might be a bit cumbersome over time. We could do separate attributes (self.session or self.s3login_credentials) and a @property getter method to get around this, but that was starting to feel like a lot of code to copy and paste into every class that needed auth. The other major problem is that I don't think we want to make users login when the Query object is created. I think we'd like to let them run Queries and search for data, and only require login once the user tries to .order_granules().

To avoid these challenges I created a mixin class. This differs from an inherited class in that it is built to extend the functionality of other classes with a small, specific set of attributes or functions. The EarthdataAuthMixin contains properties for auth, session, and s3login_credentials. These set themselves (login) the first time they are accessed, and reference the previously created object in subsequent accesses. A bonus of this mixin method is that I was able to keep the .earthdata_login() method, which should make it backwards compatible. The mixin could be added to any class that needs authentication and the user doesn't need to keep track of if s3 credentials or a regular session are needed.

I'm curious what others think of this technique. It's a half step more complicated than the self.auth=earthaccess.login() technique. It felt to me like we gained in modularity and functionality enough to warrant the complication, but I'd love additional opinions.

How it can be tested

Both the following code snippets should work:

import icepyx as ipx

region_a = ipx.Query('ATL06',[-45, 74, -44,75],['2019-11-30','2019-11-30'], \
                           start_time='00:00:00', end_time='23:59:59')

region_a.order_vars.avail()   # works without logging in!

region_a.order_granules()  # also works without logging in!

# try running these multiple times to notice that they store the previously created objects. Also notice that
# once `.session` creates the auth object, `.s3login_credentials` doesn't try to repeat logging in
region_a.session
region_a.s3login_credentials

region_a.earthdata_login()  # still works, even though it's not needed
from icepyx.core.variables import Variables

var_obj = Variables("file", path='./data/ATL06/processed_ATL06_20191201105502_10010505_006_01.h5',
          product='ATL06')

var_obj.auth
var_obj.session

Todo

github-actions[bot] commented 1 year ago

Binder :point_left: Launch a binder notebook on this branch for commit 4564b3b7fa0e0c762a51ead546dd68fac24e7b2d

I will automatically update this comment whenever this PR is modified

Binder :point_left: Launch a binder notebook on this branch for commit 16b8d3f880a07b5c1b3bf2d6fd50176753d9bdf7

Binder :point_left: Launch a binder notebook on this branch for commit 083426c0ab152c8034b85c60ccf6a8e5f6ef7419

Binder :point_left: Launch a binder notebook on this branch for commit edfa36247765d0a862f2906ade4c2751843554bc

Binder :point_left: Launch a binder notebook on this branch for commit 72c53473c527eb594c337e6de03ccef777dc4017

Binder :point_left: Launch a binder notebook on this branch for commit 434bbf24d61305a50dcf2bd67e32231d58f2d0ac

Binder :point_left: Launch a binder notebook on this branch for commit e06765acd385cd2400bec5c4b74ab4f4e5fb3276

Binder :point_left: Launch a binder notebook on this branch for commit 9e1f74551e1f6ecfd602e0603019912b08ccd02b

Binder :point_left: Launch a binder notebook on this branch for commit a2a455fb698055b17548d6063e4ffccd9c8f28c3

Binder :point_left: Launch a binder notebook on this branch for commit 1aacb1dcf33722779b926e27f1748ccca911c171

Binder :point_left: Launch a binder notebook on this branch for commit 50db05e4e7935cc0f82619251fd131e9c2d9c3f6

Binder :point_left: Launch a binder notebook on this branch for commit 12e21ba0018887c5b36e3a92fe416ade157048b0

Binder :point_left: Launch a binder notebook on this branch for commit 10bc7347a5e85a9e391a380278abfffa19cac0a2

Binder :point_left: Launch a binder notebook on this branch for commit 8033ed4fccdd521529f337779c402620bb0448c7

Binder :point_left: Launch a binder notebook on this branch for commit 6cdddbfced5735984e748748d0dadc9daeafc615

Binder :point_left: Launch a binder notebook on this branch for commit b7b8b7e007162702c325803577f266021bb2407f

Binder :point_left: Launch a binder notebook on this branch for commit 1172e9ed3aa2c00278fc047ef419d63d18ce3c19

Binder :point_left: Launch a binder notebook on this branch for commit abd950a97b6eca5f0d4fc80db9a01a1f84cd4634

Binder :point_left: Launch a binder notebook on this branch for commit 90354e64a65b57acad88373ecee75264a00a474b

Binder :point_left: Launch a binder notebook on this branch for commit 7fedc2dd4ca3c28381f76378241daf98fdaf053f

Binder :point_left: Launch a binder notebook on this branch for commit e26c9a13c6c6d024968b1dc060742451ccfb5ca3

Binder :point_left: Launch a binder notebook on this branch for commit a31b092493e0c4be5354f25b0bf8026feb6b6f8f

Binder :point_left: Launch a binder notebook on this branch for commit 3aab79a492d45ed333aa84e19908c90f3d960dc3

Binder :point_left: Launch a binder notebook on this branch for commit ade942fde832d5d8fde0ca9c82fc4c2d6a7e4b35

Binder :point_left: Launch a binder notebook on this branch for commit f052b7632284621f4dfb9eade8bf2c5d03214673

Binder :point_left: Launch a binder notebook on this branch for commit 63b0275aa3de3e55785b426718beed542104ae39

Binder :point_left: Launch a binder notebook on this branch for commit 1796a2c0427cce7606c869542bbd4a14d154d431

Binder :point_left: Launch a binder notebook on this branch for commit 4d7687b4f5035378a09bcb8da6662a89ce3140a1

Binder :point_left: Launch a binder notebook on this branch for commit 38380443c5bd37c317f91cf065bc7b16b04fe843

Binder :point_left: Launch a binder notebook on this branch for commit 693a9fa31f8454f5ee57709ee6b67ab6d17f4908

JessicaS11 commented 1 year ago

@betolink I'm curious if you have any thoughts on this approach for centralizing auth in icepyx so that it can easily be picked up across Query, Variables, and Read objects. Pulling out some relevant points from @rwegener2's great writeup above:

we should just try to reference the auth object that comes back from earthaccess.login(). (Ex. by setting a self.auth = earthaccess.login() variable on all the classes that require authentication

but...

the library would try to regenerate the objects very time self.auth.get_s3_credentials(daac="NSIDC") or self.auth.get_session() was accessed. For session that wouldn't matter so much because that is fast to access, but the s3 creds takes ~5 seconds to retrieve. This seemed like it might be a bit cumbersome over time.

She developed this Mixin approach, but we're still left with (assuming this isn't still outstanding in earthaccess too)

how can we allow the user to regenerate new s3 creds after an hour has passed? Right now there isn't a way to do that.

Do you have any thoughts?

betolink commented 1 year ago

Hi @JessicaS11 @rwegener2 this is still an outstanding issue with earthaccess the library tries to optimize and cache the s3fs session but we haven't implemented anything like this at the get_s3_credentials() level yet.

For icepyx, this mixing class looks good! maybe when we instantiate the class for the first time we could store the timestamp and return the same cached set of credentials until the time delta goes above say 50 minutes, a little like the get_s3fs_session() method.

rwegener2 commented 1 year ago

Thanks for the feedback @betolink!

I've cleaned up the PR and added:

The one question I had for @betolink:

We can talk about if it makes sense to merge this / how the best way to that.

UPDATE: Just adding that after a conversation with Luis it sounds like earthaccess does not currently propagate the error message back to the user. I think it's fine for icepyx to raise the error and direct users to view the earthaccess output. No further action needed here.

rwegener2 commented 1 year ago

Current status

  1. All code comments have been addressed
  2. I’ve written a few tests for the class

In writing those tests I did also end up changing a few lines of test_behind_NSIDC_API_login.py. There was a fixture in that class that writes a .netrc file. I moved that block to the pytest configuration so that the .netrc gets written before all the tests and can also be used by the new authentication class.

Todo

  1. update docs

I’m going to pause on this PR for the rest of the day, but the last thing I’ve got on the todo list is updating the documentation to reflect that the user is no longer required to explicitly login. After that I'll re-request review.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

rwegener2 commented 1 year ago

Ok, I think this PR is basically ready for a full review! I have just one doubt:

I've still gone ahead and re-requested review; I'll get the API reference pushed as soon as I hear back. Thank you!

JessicaS11 commented 1 year ago
  • how could I create one of those for the new class? Are they programmatically generated, or written out by hand?

Somewhere in between, unfortunately. I think the most appropriate spot for this is probably as one of the components, so we can just copy-paste a new block into that .rst file. If we now need to access auth info via query_object.auth, we should probably include a pointer function to get users from query_object.earthdata_login to query_object.auth.earthdata_login. In that function our docstring should note the deprecation of the function so users see that in the API docs.

rwegener2 commented 1 year ago

Okay @JessicaS11, I think this is ready for another check. I left comments next to yours, but to summarize:

we should probably include a pointer function to get users from query_object.earthdata_login to query_object.auth.earthdata_login. In that function our docstring should note the deprecation of the function so users see that in the API docs.

I left the Query.earthdata_login method in the documentation, which has the note about being depreciated in it.

JessicaS11 commented 1 year ago

@rwegener2 I'm running the basic data access notebook locally and... it fails. I've got valid authentication (it did show an error before bringing up the manual login box, but it did let me authenticate that way and I got valid credentials), but cannot actually do anything that's behind a login. The two failure cases I found are: region_a.order_vars.avail() and region_a.order_granules(), both of which are getting hung up in the requests stage.

UPDATE: I should have known better than to try and test icepyx code on a Wednesday. NSIDC system is down for maintenance (see banner at: https://nsidc.org/data/icesat-2). Wednesday is often their maintenance day...

rwegener2 commented 1 year ago

I'm running the basic data access notebook locally and... it fails.

Ok, I'm trying it out now. @JessicaS11

I also get a failure, but mine is happening on

#search for available granules and provide basic summary info about them
region_a.avail_granules()

due to AssertionError: Your search returned no results; try different search parameters. Are you changing something about the default search parameters to locate more data? I just opened IS2_data_access.ipynb and did Restart and Run All Cells. I wouldn't think that would be an error with this code, but if it's not expected I can dig into it.

JessicaS11 commented 1 year ago

I wouldn't think that would be an error with this code, but if it's not expected I can dig into it.

There are a few examples in there that are set up to fail to demo things for the user. With Run All Cells you're likely hitting up against one of those.

rwegener2 commented 1 year ago

I just:

Let me know if anything comes up while running the code @JessicaS11 🤞🏻

rwegener2 commented 1 year ago

Yay, soooo exiting! Thanks for all of your attentive review @JessicaS11! Merging now! 🎊