stac-utils / pystac-client

Python client for searching STAC APIs
https://pystac-client.readthedocs.io
Other
152 stars 47 forks source link

Handling HTTP errors in search.items() generator #712

Open christophfriedrich opened 1 month ago

christophfriedrich commented 1 month ago

I'm searching a STAC catalog and then iterate over the result with the items() generator:

def get_search_result(bbox, start, end):
    catalog = stac.open("https://earth-search.aws.element84.com/v1")
    return catalog.search(
        max_items = None,
        collections = ['sentinel-2-l2a'],
        bbox = bbox,
        datetime = [start+'T00:00:00Z', end+'T00:00:00Z'],
    )
search = get_search_result(bbox, start, end)
for item in search.items():
    # download needed assets
    # process them into product

It's quite a lengthy loop, as each iteration takes about a minute (I don't know if that is relevant).

The other day, about 20 minutes into the loop, my worker crashed with a RemoteDisconnected error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/home/hsnb/./server-worker.py", line 385, in run_worker
    for item in search.items():
  File "/usr/local/lib/python3.12/site-packages/pystac_client/item_search.py", line 691, in items
    for item in self.items_as_dicts():
  File "/usr/local/lib/python3.12/site-packages/pystac_client/item_search.py", line 702, in items_as_dicts
    for page in self.pages_as_dicts():
  File "/usr/local/lib/python3.12/site-packages/pystac_client/item_search.py", line 734, in pages_as_dicts
    for page in self._stac_io.get_pages(
  File "/usr/local/lib/python3.12/site-packages/pystac_client/stac_api_io.py", line 307, in get_pages
    page = self.read_json(link, parameters=parameters)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pystac/stac_io.py", line 205, in read_json
    txt = self.read_text(source, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pystac_client/stac_api_io.py", line 162, in read_text
    return self.request(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pystac_client/stac_api_io.py", line 218, in request
    raise APIError(str(err))
pystac_client.exceptions.APIError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Apparently something went wrong during the communication with the server. Until today, I didn't even know that each yielding of the next item issues another HTTP request, but of course that makes sense, as all the details of that item have to be fetched.

That one time it failed -- happens.

But how to handle this? Adding a try ... except around the loop would certainly be smart and at least save my worker from a total crash. But it would still throw me out of the loop. I think it would be nice if pystac_client would automatically retry failed requests one or two times?

Something similar seems to have been discussed recently in #680. That discussion ended with "not planned", because the issue was not seen on the pystac_client side. Maybe this example gives a new perspective on the topic?

jsignell commented 1 month ago

Thanks for taking the time to write this up. If you include your import lines and the bbox, start, and end then I can try to reproduce your example, but at first glance I suspect that you could get around this issue by getting the search results first and then doing the per-item work. Something like:

items = list(get_search_result(bbox, start, end).items())
for item in items:
    # download needed assets
    # process them into product
christophfriedrich commented 1 month ago

Hi Julia, sorry for only coming back to you now.

The values were

bbox = [13.215265,51.118933,13.260498,51.147263]
start = "2021-02-15"
end = "2021-04-30"

and the only import used in the code I posted above is

from pystac_client import Client as stac

Feel free to check this specific case, to help you pinpoint it: The error happened when processing 2021-03-22 and for some reason the loop is going backwards in time, i.e. starting at the end and going towards the start. But to be honest I highly doubt that the error has anything to do with the specific search and this specific item. I think it was just a one-time connection issue that just happened to happen at that very moment.


Thanks also for the workaround idea. I think it would minimise the risk of this happening again, as the requests are squeezed into a much shorter timespan instead of stretching them over possibly hours. But on the bottom line, if such an error occurs again, I would still be left with an aborted pipeline. To try the failed request again, I would need to restart the whole fetching and the code would need to look something like this:

search = get_search_result(bbox, start, end)
try:
    items = list(search.items())   # normal try
except:
    try:
        items = list(search.items())    # try again
    except:
        return   # giving up after two fails
# if we made it here it worked
for item in items:
    # download needed assets
    # process them into product

Which is why I'd repeat my statement/question from the initial post: _I think it would be nice if pystacclient would automatically retry failed requests one or two times?


But yeah, your idea at least makes it less likely that this will be an issue again, so as a first step I'll implement it. Another question about the details of it:

Meanwhile I came across some code that uses get_all_items() on the search result, which sounded exactly like what I wanted. So I looked it up, found it to be deprecated, with the hint to use item_collection() instead. How is this different from your list(...) on the items() generator?

jsignell commented 1 month ago

I wrote up responses to your comments and then I did a little search to see if this conversation has come up before. It seems that it has (#532) and retries are actually already implemented 🙈 . You can read about how to configure retries in the docs: https://pystac-client.readthedocs.io/en/latest/usage.html#configuring-retry-behavior


This is what I had written before:

I think it would be nice if pystac_client would automatically retry failed requests one or two times

Yeah I hear you. I was just wondering if this kind of failure is sporadic (and therefore a good candidate for retries) or a genuine timeout.

the hint to use item_collection() instead

Using item_collection is probably the preferred approach.