nsidc / earthaccess

Python Library for NASA Earthdata APIs
https://earthaccess.readthedocs.io/
MIT License
388 stars 75 forks source link

consider alternative names for `search_data` and `search_datasets` #770

Open itcarroll opened 1 month ago

itcarroll commented 1 month ago

Maybe we should think about the the top-level API; are these good names for these functions? Why not use common terminology like search_collections and search_granules? "Dataset" is often used to refer to a single data file. 🤔

~ @mfisher87 in #769

Since "granules" is not very generic either, an option borrowed from the STAC spec could be "search_collections" vs "search_items".

Seems like a Milesone 1.0 change though ...

chuckwondo commented 1 month ago

I agree that we should consider alternative function names. My preference would be to use the prefix find_ instead of search_, but not fussed about it.

mfisher87 commented 1 month ago

I have no strong preference on search v find, but I am curious what reasoning underlies your preference @chuckwondo?

chuckwondo commented 1 month ago

I have no strong preference on search v find, but I am curious what reasoning underlies your preference @chuckwondo?

Aside from it being 2 letters shorter, in previous contexts, I've often seen DB client APIs using find (or find_by_X) as a naming convention, so it is anecdotally arguably more consistent with other things. However, search may be equally widely used, so again, I don't have any overtly strong preference. It's just a mild preference, perhaps more personal than logical.

mfisher87 commented 1 month ago

Thanks for expounding!

betolink commented 1 month ago

I like the idea of aligning with STAC, a while ago Scott suggested that and I think it'll be valuable to avoid cognitive load from users, the one thing I'm afraid is to deprecated existing names. I think we should try to not break the API to the extend possible while encouraging people to use the new conventions. See: https://github.com/nsidc/earthaccess/discussions/221

asteiker commented 1 month ago

I know this has caused confusion in the past, even as others at NSIDC have come up to speed on the library, so I fully support updated names here! I like what @itcarroll proposed to align with STAC. We may also want to consider the language most commonly used within the NASA Earthdata ecosystem. You can't have Earthdata Search without "search", for example, so I'd be more keen on using this vs "find". As an aside, this seems like a great use case for #761 too.

chuckwondo commented 1 month ago

Just to clarify, is this the current proposal?

If so, +1 from me.

mfisher87 commented 1 month ago

As an aside, this seems like a great use case for https://github.com/nsidc/earthaccess/issues/761 too.

:rocket: :100:

Just to clarify, is this the current proposal?

* rename `search_datasets` to `search_collections`

* rename `search_data` to `search_items`

If so, +1 from me.

+1

one thing I'm afraid is to deprecated existing names. I think we should try to not break the API to the extend possible while encouraging people to use the new conventions

I do worry that having multiple aliases for common features could lead to confusion, as people might think they do different things. I really like having "one correct way". I do believe a long deprecation period would be in order for top-level API things.

We need to probably have deeper discussions about how to communicate around time-until-deprecation. Should we always include a minimum date in our deprecation messages, e.g. DeprecationWarning(" ... Obsoletion will occur no sooner than YYYY-MM-DD.")?

Related #766

andypbarrett commented 1 month ago

I like the alignment of earthaccess terminology with STAC. collections already aligns in STAC and NASA-speak. However, as a newbie to STAC lingo, I find the usage of items unclear.

chuckwondo commented 1 month ago

I have no strong preference on search v find, but I am curious what reasoning underlies your preference @chuckwondo?

Aside from it being 2 letters shorter, in previous contexts, I've often seen DB client APIs using find (or find_by_X) as a naming convention, so it is anecdotally arguably more consistent with other things. However, search may be equally widely used, so again, I don't have any overtly strong preference. It's just a mild preference, perhaps more personal than logical.

@mfisher87, after looking at the proposed new names again -- search_collections and search_items -- I now have an arguably better reason for preferring find_collections and find_items: The term search_collections is arguably ambiguous in terms of the types of "things" it will find. Does it search the available collections to find things within collections, or does it search for collections?

This analogy might be a bit of a stretch, but consider the case of security procedures at a place/event, where people may be subject to a "bag search." In that context, nobody is searching for bags, they are searching within bags (for banned "items"). Thus, the security folks running a search_bags function don't expect the result to be a "list of bags," but rather a list of "banned items" within given bags.

Thus, I would argue that a "collection search" implemented by a function named search_collections could easily be misinterpreted to mean a search for items within collections, not a search for collections, or to simply cause someone to wonder which interpretation is correct, if they recognize the ambiguity. Thus, the name find_collections arguably eliminates such ambiguity by explicitly stating what we expect to find: collections. (Similarly for find_items.)

I like the alignment of earthaccess terminology with STAC. collections already aligns in STAC and NASA-speak. However, as a newbie to STAC lingo, I find the usage of items unclear.

@andypbarrett, I agree that "items" is perhaps too generic for many folks. Although "collections" is perhaps no less generic a term, anecdotally, it may feel more specific to most folks dealing with this information. I don't have any particular preference or suggestion for a better term than "items," but if you have any suggestions, please share so we can "vote" on it here.

mfisher87 commented 1 month ago

Thus, I would argue that a "collection search" implemented by a function named search_collections could easily be misinterpreted to mean a search for items within collections, not a search for collections, or to simply cause someone to wonder which interpretation is correct, if they recognize the ambiguity. Thus, the name find_collections arguably eliminates such ambiguity by explicitly stating what we expect to find: collections. (Similarly for find_items.)

:100: This is an excellent point. I'm on team find now :)