icesat2py / icepyx

Python tools for obtaining and working with ICESat-2 data
https://icepyx.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
210 stars 107 forks source link

Modularizing authentication #433

Closed rwegener2 closed 11 months ago

rwegener2 commented 1 year ago

The Problem

The authentication method that exists right now is inherently tied to the Query class because the .earthdata_login() method is a method on that class. That has led to passing ._session values from Query objects to Variables objects as needed.

As the cloud read effort develops, additional modules, such as Read, will also need to authenticate. Because of this I would like to propose that we make authentication in the repository more modular, so it can be more easily extended to different classes.

Possible Solutions

Both of the ideas below create an EarthdataAuth class. This class would have the existing earthdata_login() functionality within it.

1 - Inheritance

We could create an class called EarthdataAuth that holds the ._auth, ._session, and ._s3login_credentials attributes that currently get created in Query. Query, Variables, and other classes could then be subclasses of the EarthdataAuth superclass.

Pros/Cons: After some reading it seems that isn't exactly the type of relationship we want to convey. The set of complexity added by a new level of inheritance feels disproportionate to the fairly simple need to keep track of 2-3 attributes and one function. The pro of this solution, however, is that I believe we could it up such that nothing about the user experience changes.

2 - Creating EarthdataAuth as a class that other objects have as an attribute

Query and other related objects could have an .auth attribute that is an instance of EarthdataAuth. EarthdataAuth can get created either by calling the earthdata_login() function, or automatically when a user needs authentication. The user (or the internal code) would access attributes like ipx.Query.auth._session.

Pros/Cons: I think this solution could simplify our internal code and at least for me it is more intuitive than inheritance (that may just be me, though). The con of this solution is that it would change how the user authenticates, either by removing that step all together or by requiring they switch to ipx.Query.auth.earthdata_login().

Questions

Code

JessicaS11 commented 1 year ago

Thanks for this great writeup @rwegener2! Of the two solutions you propose, I would definitely lean towards the second one. We actually not too recently removed our own Earthdata module, instead building out a similar set of auth methods directly in earthaccess (and also gaining the improved cloud and multi-DAAC auth already built in to that library). In that line of thinking, I wonder if - rather than create an internal EarthdataAuth class - we could simply leverage the earthaccess auth object directly, more or less as is done in Query.earthdata_login(). Then the challenge turns back into your comment below.

Is there a reason that we prefer to ask users to run the earthdata_login() function themself?

The short answer is this is a legacy from the early days of programmatic data access. In many cases, particularly for tutorials, the primary mode of authentication was interactive (and there was no icepyx or earthaccess wrapping the manual sending of auth info with a request and response sequence, so users needed to add their credentials to the dict themselves). Thus, even once we added automatic auth options we wanted it to be clear to the user this was happening and didn't want to break existing workflows, so we did not make earthdata_login() private. However, if the user calls Query.download() and has not logged in, this will be called automatically, so they are not explicitly required to run it themselves.

Are we interested in setting up the workflow such that the user is prompted for login credentials only when the execute a function that requires authentication? I think that could happen fairly easily on our end, but it would change the user experience.

I think this is the way to go, and I think we could implement it without changing the user experience. For classes that might need auth, we can include it as part of the init as you showed in the Variables module in your linked commit. Rather than directly passing in a ._session (as is done now), the auth object can be passed and we can use auth.get_session(). This will also help us be better positioned to adopt more earthaccess functionality in the future (e.g. using it to interact with CMR, which we currently still do "manually" by sending requests).