planetlabs / planet-client-python

Python client for Planet APIs
https://planet-sdk-for-python-v2.readthedocs.io/en/latest/
Apache License 2.0
274 stars 92 forks source link

Add retries to file download to avoid throwing an exception in the case of normal retry errors #1002

Closed jreiberkyle closed 1 year ago

jreiberkyle commented 1 year ago

Related Issue(s):

Closes #974

Proposed Changes:

For inclusion in changelog (if applicable):

  1. Orders and data downloads now incorporate retries and worker limits. Retries make them more robust in the case of common http errors and including them in the worker limits reduces the chance of overloading the server and triggering the errors in the first place. For this to happen, the write functionality has been moved from models.StreamingBody (which has been removed) to http.Session

Side note: Adding functionality while deleting a net of 59 lines seems like a win =)

Things I didn't do but believe would be a good idea:

  1. include ServerError as a retryable http error (this makes many subscription failure tests very long because they are using ServerError to initiate a failure)
  2. Move tests/unit/test_Session.py into tests/unit/test_http.py so all Session tests are together
  3. Move the rest of models.py into http.py, it's getting pretty small without StreamingBody

Not intended for changelog:

  1. models.Response now has new filename and length properties which are None unless the response is from a GET request for a downloadable resource.

Diff of User Interface

Old behavior:

Given a list of activated order ids stored in oids.txt, the following script would result in some downloads failing due to http errors:

import asyncio
from collections import Counter

import planet

import logging; logging.basicConfig(filename='log.txt', level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

async def download(count=1, directory_path='downloads'):
    async with planet.Session() as s:
        client = s.client('orders')

        with open('oids.txt', 'r') as f:
            order_ids = f.readlines()

        oids = [order_id.strip() for order_id in order_ids[:count]]

        res = await asyncio.gather(*[
            _download_order(client, oid, directory_path)
                    for oid in oids], return_exceptions=True)

        errors =[r for r in res if issubclass(type(r), Exception)]

        print(Counter([type(r) for r in errors]))
        if errors:
            raise(errors[0])

async def _download_order(client, order_id, directory):
    with planet.reporting.StateBar(state='polling') as reporter:
        await client.wait(order_id, callback=reporter.update_state, max_attempts=0, delay=7)
        reporter.update_state('downloading')
        await client.download_order(order_id, directory=directory, progress_bar=False, overwrite=True)
        reporter.update_state('downloaded')

asyncio.run(download(count=100))

New behavior:

All downloads are successful

PR Checklist:

(Optional) @mentions for Notifications:

jreiberkyle commented 1 year ago

@sgillies merging to prepare a release. You will get a chance to review before the release as you will be the release manager =)