Read arguments - Githubissues

rwegener2 commented 10 months ago

What was Done

This PR changes the input parameters for the Read module in two ways:

combine the filepath_pattern and data_source arguments into a single parameter (maintained the name data_source).
we are no longer going to require the product argument from the user. It will still be optional, both for the purposes of deprecation, and also to be used in the event that a user provides a list of files that do not all contain the same product.

The options for the new version of thedata_source parameter are:

a string to a single local file
a string to a directory
a glob string
a list of files

Unsupported options for data_source are:

user provides a list of directories
user provides a Query object (a question about this in the Notes section below)

How It was Done

The majority of the changes are all in the __init__ method. The overall flow of the method is now:

raise warnings for depreciated arguments
create a filelist from whatever the data_source input parameter is
do a bunch of checks and raise warnings for lists with multiple products or a user-specified product that doesn't match the file metadata
assign a product to the object

Other notable changes:

product and filelist are now public properties

Depreciation comments

`product`

It complicates the logic quite a bit to continue to allow the user to provide a product argument. I do think we should maintain it for the moment because it allows for backwards compatibility, but in the future it would be a lot simpler if the user were just required to provide a path to files of only a single product type. We give them lots of options for flexibility via glob strings.

Notes & Questions

We discussed allowing for a Query object as a data_source parameter type. It seems like this only makes sense for the cloud reading, since the Query object returns the paths of cloud data. Is this correct, or am I missing something?
If a user provides a directory with data in it, do we want to recursively search the directory for sub-directories with data? My inclination is no, and that if a user wants that they will need to use the glob module on their own with the recursive=True argument.
In warning messages my tone is that giving the product argument is discouraged, but I'm open to hearing other opinions on this. (See note above in Depreciation Comment. I'm arguing from a dev perspective, but I may not be doing a complete job of seeing the user side. So, well, teamwork 🙂)

Todo

[ ] use a mock filesystem to test filelist creation
[x] docstrings

github-actions[bot] commented 10 months ago

:point_left: Launch a binder notebook on this branch for commit 612662e3ffc9395bc90557c4fdaba0a88500b068

I will automatically update this comment whenever this PR is modified

:point_left: Launch a binder notebook on this branch for commit c16a00359034b5920780ce0f7719ea6c9000891d

:point_left: Launch a binder notebook on this branch for commit 76480783444165519b66590b31e99248b27ad35b

:point_left: Launch a binder notebook on this branch for commit 4cfbfdbd7cda2701ab3036cd381cd16f7f889c6c

:point_left: Launch a binder notebook on this branch for commit 203f3adafc085bce44d0fd993b5932f60366f33c

:point_left: Launch a binder notebook on this branch for commit 10d15910a1f82ec2b18ba61eff647e243d44651c

:point_left: Launch a binder notebook on this branch for commit 6f5beadcb8bca332fbf9fbda79310e075bbd8af3

:point_left: Launch a binder notebook on this branch for commit 035ee5ab1e33dc43a471c36286a59c036d0cb292

:point_left: Launch a binder notebook on this branch for commit 903c351007bc7245bf4d107b48b1f38c67acf900

:point_left: Launch a binder notebook on this branch for commit d842bde28825570112b08264ecef33e47d8b21a8

:point_left: Launch a binder notebook on this branch for commit 5e06de94da0c492b4b3cc94a5b4d834b0dc829a4

:point_left: Launch a binder notebook on this branch for commit 9ca29f134efa8bdb8126113c5e232738e6032cce

:point_left: Launch a binder notebook on this branch for commit e8e35ad58a7b7cc65c7e74f287e6eb706a324ecd

:point_left: Launch a binder notebook on this branch for commit 4bcc5187c36d46cfcdf990a93acf0bef8f36204b

:point_left: Launch a binder notebook on this branch for commit b2c273518bee22f8f1417d51cdaf3a39958bc56e

:point_left: Launch a binder notebook on this branch for commit 6b953f9c63151c20a5224dc9e38db5b73dae02f8

:point_left: Launch a binder notebook on this branch for commit 45704a4e94368b7fe64466908aaff45fdc78f5db

:point_left: Launch a binder notebook on this branch for commit 2bf28086f00bab031390bf85cc81baa5257e8bce

:point_left: Launch a binder notebook on this branch for commit 12428817b9445daba3933e5a75d885d622aef1ad

:point_left: Launch a binder notebook on this branch for commit 5f8589a2700cb0dc1b53726a91d9f5b67acd8189

rwegener2 commented 10 months ago

Ok @JessicaS11 and/or @weiji14, this PR is ready for comments. I am still working on 1) updating docstrings and 2) writing a few tests, but in the meantime I'm interested in initial feedback. Specifically, I left 3 questions in the Notes & Questions section of the writeup.

Thanks!

JessicaS11 commented 10 months ago

It complicates the logic quite a bit to continue to allow the user to provide a product argument. I do think we should maintain it for the moment because it allows for backwards compatibility, but in the future it would be a lot simpler if the user were just required to provide a path to files of only a single product type. We give them lots of options for flexibility via glob strings.

You make a compelling argument, and if we provide a few good glob examples (or links to them) in the notebook then we are hopefully showing users what they'd need to know. I think then I would be okay with deprecating the user-input product argument.

We discussed allowing for a Query object as a data_source parameter type. It seems like this only makes sense for the cloud reading, since the Query object returns the paths of cloud data. Is this correct, or am I missing something?

Correct! So if we're adding cloud support specifically in a next step, then we can bookmark adding this there.

If a user provides a directory with data in it, do we want to recursively search the directory for sub-directories with data? My inclination is no, and that if a user wants that they will need to use the glob module on their own with the recursive=True argument.

See comment above - this would be an example glob use case I'd want to include.

In warning messages my tone is that giving the product argument is discouraged, but I'm open to hearing other opinions on this. (See note above in Depreciation Comment. I'm arguing from a dev perspective, but I may not be doing a complete job of seeing the user side. So, well, teamwork 🙂)

Deprecated = discouraged in my book. Wondering if we should put a projected deprecation date in the message (or at the very least add a comment for ourselves on the date and last version to use the keyword) so that we can set a standard for actually removing deprecated things from the code.

rwegener2 commented 10 months ago

Thanks for the comments @JessicaS11!

Wondering if we should put a projected deprecation date in the message (or at the very least add a comment for ourselves on the date and last version to use the keyword) so that we can set a standard for actually removing deprecated things from the code.

I did some quick reading about depreciation recommendations. What you mention here, of informing users of in what version the feature will be depreciated, is considered best practice. To the "how long should you wait to depreciate" question, I noticed that people seem to mark time not by number of months but by versions. One SO comment summarized "Major version releases are a good time to remove deprecated methods. Minor releases should typically not contain breaking changes." Other common themes included communicating with users, ex. in warnings (I think icepyx does this well) or blog posts.

Have you considered a major release to v1.0.0 at some point? Either the integration of cloud reading or the Argo project that you're working on (or both) seem like big changes that could perhaps prompt a major release. The big changes you made to authentication could also be part of what gets announced. The major version change could then be a good time to remove many features that have been maintained only for backward compatibility.

review-notebook-app[bot] commented 10 months ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

rwegener2 commented 10 months ago

Ok, this PR is ready for review! Since the last comment I updated the documentation and also added glob_kwargs parameter that can be used to recursively search directories. The thing I haven't been able to figure out is mocking a filesystem for testing with h5py reads. I've spent a few hours trying to get something set up to no avail. This morning I posted a question on Stack Overflow, so if that works out I'll follow up.

Looking forward to a review!

rwegener2 commented 10 months ago

@JessicaS11 not sure what your timeline is for the remainder of this week, but this is also ready for re-review. You're already reviewed once, so hopefully it's mostly there.

If you don't get to it don't worry about it. I can create new branches off of this branch for the next few weeks!

rwegener2 commented 9 months ago

All comments addressed. I'm looking for a 👍🏻 about how the glob docs section was phrased. Then we are just waiting to see what happens with the visualization errors caused by the OpenAltimetry API (cc @JessicaS11).

JessicaS11 commented 9 months ago

Looks great, @rwegener2. I'll hold off on approving until #454 is fixed so there's not an accidental merge, but this is good to go!

JessicaS11 commented 9 months ago

@all-contributors please add @rwegener2 for bug, code, doc, ideas, maintenance, review, test, tutorial please add @jpswinski for review please add @whyjz for tutorial

allcontributors[bot] commented 9 months ago

@JessicaS11

I've put up a pull request to add @rwegener2! :tada:

I've put up a pull request to add @jpswinski! :tada:

I've put up a pull request to add @whyjz! :tada:

icesat2py / icepyx

Read arguments #444

What was Done

How It was Done

Depreciation comments

`product`

Notes & Questions

Todo