Closed SpacemanPaul closed 4 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
2ee5434
) 85.65% compared to head (8ee99fb
) 85.62%. Report is 2 commits behind head on develop-1.9.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
It would be really neat if this behaviour could be implemented via a decorator rather than multiple wrapper functions that are all but identical
It would be really neat if this behaviour could be implemented via a decorator rather than multiple wrapper functions that are all but identical
Yeah, I thought about that but I couldn't see how to do it without significantly complicating the contract between the abstract base class and the implementations. Definitely something that could be revisited though.
I think we're better off without this. The added complexity both internally and in the API isn't worth it. The extra argument requires more explanation and make it less clear what the functions do.
Python is full of iterator functions (especially range()
and enumerate()
), that users need to call list()
on if they want it fully loaded into memory. I think we should do the same with search results.
We could include examples in the documentation of using list()
on the results, if we think that's a valuable thing for people to do. (I'm not sure if it is.)
search_by_products()
is quite a different method, so maybe it could be different. I'm not sure how and how much it is used.
I think we're better off without this.
Yeah, I can see that. The main use case as far as I can see is internal - for tests. It's much easier to check test results if they are instantiated into a list.
The main justification is the deprecation/removal of dc.index.datasets.search_eager()
- which is just an alias for list(dc.index.datasets.search()
- I find the name "search_eager"`very unclear and ambiguous, but it has been heavily used historically in tests. I find this way of doing the same thing more intuitive, as well as being more generally applicable.
Having said that, I don't mind dropping it - except that I built the other more useful PR on top of this one, so dropping this one would cause me some short term pain.
Yeah, I can see that. The main use case as far as I can see is internal - for tests. It's much easier to check test results if they are instantiated into a list.
The main justification is the deprecation/removal of
dc.index.datasets.search_eager()
- which is just an alias forlist(dc.index.datasets.search()
- I find the name "search_eager"`very unclear and ambiguous, but it has been heavily used historically in tests. I find this way of doing the same thing more intuitive, as well as being more generally applicable.
I completely support removing search_eager()
for being confusing, and for simplifying the API.
If it's mostly for in tests, I even more think that we should be calling list()
explicitly. It's such a common idiom and makes it clearer what is happening. It's also helpful for users of the API as documentation.
Having said that, I don't mind dropping it - except that I built the other more useful PR on top of this one, so dropping this one would cause me some short term pain.
Sorry!
I do think short term pain over long term pain and support though. It's hopefully not too hard to remove from the next PR.
Reason for this pull request
Add
fetch_all: bool = False
argument to all (non-deprecated) dataset search methods, as per EP-13.Proposed changes
fetch_all: bool
argument (defaulting toFalse
) added to all non-deprecated dataset search methods. When true, all search results are gathered and returned in a fully instantiated list. When false (the default), the method acts as a generator and yields a result at a time.Notes:
yield
statement in a method/function makes it act as a generator - you can't decide dynamically whether to yield or return from a function. Basic form looks like:The
search_by_products()
method has two layers of lazy evaluation (in generator mode, yields tuples containing a product and a nested generator that yields datasets. Thefetch_all=True
version fully instantiates both layers.docs/about/whats_new.rst
for all changes