Open sgillies opened 1 year ago
In a close read of the code it looks like it is only retrying the initial connection https://github.com/planetlabs/planet-client-python/blob/79d9a3cb952fcd4f75e5e935ee067455580c779d/planet/http.py#L411C38-L411C38. If read timeout errors happen at https://github.com/planetlabs/planet-client-python/blob/79d9a3cb952fcd4f75e5e935ee067455580c779d/planet/clients/orders.py#L259 (for example) they may not be retried.
calling out this very insightful quote for input into the python api docs effort #994:
This project has tended to document order creation and download as tasks that are done together, but that may not be a best practice for large batches of orders.
script to create many orders for testing, run with python create_orders.py >> oids.txt
create_orders.py
import asyncio
import planet
async def create(count=1):
item_ids = ['20230719_071823_96_2479']
requests = [planet.order_request.build_request(
name=str(i),
products=[
planet.order_request.product(item_ids=item_ids,
product_bundle='analytic_udm2',
item_type='PSScene')],
)
for i in range(count)]
async with planet.Session() as s:
client = s.client('orders')
orders = await asyncio.gather(*[
_create_order(client, request)
for request in requests
])
for o in orders:
print(o['id'])
async def _create_order(client, order_detail):
with planet.reporting.StateBar(state='creating') as reporter:
order = await client.create_order(order_detail)
reporter.update(state='created', order_id=order['id'])
return order
asyncio.run(create(count=100))
Interestingly, one run of out of three runs this got the following error:
Traceback (most recent call last):
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/planet/http.py", line 279, in _raise_for_status
response.raise_for_status()
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/httpx/_models.py", line 749, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'https://api.planet.com/compute/ops/orders/v2'
For more information check: https://httpstatuses.com/500
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "create_orders.py", line 35, in <module>
asyncio.run(create(count=100))
File "/Users/jennifer.kyle/.pyenv/versions/3.8.6/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Users/jennifer.kyle/.pyenv/versions/3.8.6/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "create_orders.py", line 19, in create
orders = await asyncio.gather(*[
File "create_orders.py", line 30, in _create_order
order = await client.create_order(order_detail)
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/planet/clients/orders.py", line 148, in create_order
response = await self._session.request(method='POST',
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/planet/http.py", line 387, in request
http_response = await self._retry(self._send, request, stream=False)
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/planet/http.py", line 330, in _retry
raise e
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/planet/http.py", line 315, in _retry
resp = await func(*a, **kw)
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/planet/http.py", line 393, in _send
http_resp = await self._client.send(request, stream=stream)
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/httpx/_client.py", line 1617, in send
response = await self._send_handling_auth(
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
response = await self._send_handling_redirects(
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/httpx/_client.py", line 1703, in _send_handling_redirects
raise exc
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/httpx/_client.py", line 1685, in _send_handling_redirects
await hook(response)
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/planet/http.py", line 282, in _raise_for_status
cls._convert_and_raise(e)
File "/Users/jennifer.kyle/.pyenv/versions/planet-client-python-3.8.6/lib/python3.8/site-packages/planet/http.py", line 95, in _convert_and_raise
raise error_type(response.text)
planet.exceptions.ServerError: Internal Server Error
Speaking to the above error, what is needed is
asyncio.gather
with return_exceptions=True
so that the whole process doesn't error out if one error is encountered and possiblyexceptions.ServerError
to http.RETRY_EXCEPTIONS
create_orders.py
import asyncio
import planet
async def create(count=1):
item_ids = ['20230719_071823_96_2479']
requests = [planet.order_request.build_request(
name=str(i),
products=[
planet.order_request.product(item_ids=item_ids,
product_bundle='analytic_udm2',
item_type='PSScene')],
)
for i in range(count)]
async with planet.Session() as s:
client = s.client('orders')
orders = await asyncio.gather(*[
_create_order(client, request)
for request in requests
], return_exceptions=True)
async def _create_order(client, order_detail):
with planet.reporting.StateBar(state='creating') as reporter:
order = await client.create_order(order_detail)
reporter.update(state='created', order_id=order['id'])
print(order['id'])
asyncio.run(create(count=100))
script to download many orders that were created already and recorded in oids.txt
. For logging, use python download_orders.py > log.txt
download_orders.py
import asyncio
import planet
async def download(count=1, directory_path='downloads'):
async with planet.Session() as s:
client = s.client('orders')
with open('oids.txt', 'r') as f:
order_ids = f.readlines()
oids = [order_id.strip() for order_id in order_ids[:count]]
res = await asyncio.gather(*[
_download_order(client, oid, directory_path)
for oid in oids], return_exceptions=True)
for res in zip(oids, res):
if issubclass(type(res[1]), Exception):
print(f'Failed download: {res[0]}')
print(res[1])
else:
print(f'Successful download: {res[0]}')
async def _download_order(client, order_id, directory):
with planet.reporting.StateBar(state='waiting') as reporter:
await client.wait(order_id, callback=reporter.update_state, max_attempts=0, delay=7)
await client.download_order(order_id, directory, progress_bar=True, overwrite=True)
asyncio.run(download(count=100))
This works now. I refactored the code a little bit for my purpose and it looks like this.
def activate_and_download_orders(api_key, input_file):
auth = Auth.from_key(api_key)
def activate_order_wrapper(input_file):
list_of_order_ids = []
async def create(df):
list_of_requests = []
for index, row in df.iterrows():
temp_date = datetime.strptime(row['fulfilled_date'], "%Y-%m-%d").strftime("%Y%m%d")
name = f"{temp_date}_SKYSAT_{row['order_name']}"
item_ids = eval(row['item_id'])
item_ids = [item for sublist in item_ids for item in sublist]
list_of_requests.append(planet.order_request.build_request(
name=name,
products=[ # see if delivery function of order_request be used here to directory download zip file
planet.order_request.product(item_ids=item_ids,
product_bundle='pansharpened_udm2',
item_type='SkySatCollect')],
delivery=planet.order_request.delivery(
archive_type='zip',
single_archive=True,
archive_filename=f'{name}.zip')))
async with planet.Session(auth=auth) as s:
client = s.client('orders')
orders = await asyncio.gather(*[
_create_order(client, request)
for request in list_of_requests
], return_exceptions=True)
async def _create_order(client, order_detail):
with planet.reporting.StateBar(state='creating') as reporter:
order = await client.create_order(order_detail)
reporter.update(state='created', order_id=order['id'])
list_of_order_ids.append(order['id'])
df = pd.read_csv(input_file)
asyncio.run(create(df))
return list_of_order_ids
list_of_orders_to_be_downloaded = activate_order_wrapper(input_file)
def download_order_wrapper(list_of_orders_to_be_downloaded):
directory = "./orders"
# Check if the directory exists
if not os.path.exists(directory):
# Create the directory
os.makedirs(directory)
else:
pass
async def download(list_of_orders_to_be_downloaded, directory_path):
async with planet.Session(auth=auth) as s:
client = s.client('orders')
oids = [order_id for order_id in list_of_orders_to_be_downloaded]
res = await asyncio.gather(*[
_download_order(client, oid, directory_path)
for oid in oids], return_exceptions=True)
for res in zip(oids, res):
if issubclass(type(res[1]), Exception):
print(f'Failed download: {res[0]}')
print(res[1])
else:
print(f'Successful download: {res[0]}')
async def _download_order(client, order_id, directory):
with planet.reporting.StateBar(state='waiting') as reporter:
await client.wait(order_id, callback=reporter.update_state, max_attempts=0, delay=7)
await client.download_order(order_id, directory, progress_bar=True, overwrite=True)
asyncio.run(download(list_of_orders_to_be_downloaded, directory))
download_order_wrapper(list_of_orders_to_be_downloaded)
# for UNIX systems
# write the same for Windows Anaconda Prompt
os.system('mv ./orders/*/*.zip ./orders/')
# os.system(move /Y .\orders\*\*.zip .\orders\)
This can be even better if the user types two commands: one for activation and one for downloading. I am gonna ask them if they will agree to it.
But it failed for two of the orders still, I am unsure why it's happening. I just got a failed download
error.
Yeah, we still need to add retry to the download. I'm working on that. These scripts are mostly designed to hone in on and trigger the error. Which they are doing spectacularly =) And the idea is that they won't trigger the error when retry is added. Stay tuned!
Under high download concurrency, httpcore and httpx errors propagate up from the
StreamingBody
instance at https://github.com/planetlabs/planet-client-python/blob/main/planet/clients/orders.py#L259. These errors do not manifest at lower concurrency. Streaming responses is a strategy used to keep the memory footprint of programs manageable while downloading multiple large (up to ~100 MB) TIFFs concurrently.Possible lead: the same kind of
asyncio.exceptions.CancelledError
is mentioned at https://github.com/agronholm/anyio/issues/534. Which was closed, concluding that callers have to expect read timeouts and work around them.Possible workaround: separate order creation from order download. Order creation is more reliable and when it does fail, fails differently. It is probably less complicated to retry order downloads if they are de-interleaved from order creation. This project has tended to document order creation and download as tasks that are done together, but that may not be a best practice for large batches of orders.
Traceback 1:
Traceback 2:
cc @aayushmalik