Open omad opened 1 year ago
There's a few examples in this repo that use threads instead and I think they work fast and fine... it's much simpler than async! For example: https://github.com/opendatacube/odc-tools/blob/develop/apps/dc_tools/odc/apps/dc_tools/esa_worldcover_to_dc.py#L185
Background
To get good performance from AWS S3, it's necessary to parallelise requests.
The
odc-aio
library provides functions used in the odc-tools CLI applications, and is implemented using Async Python and the aiobotocore library.This has worked well for several years, providing good performance. However, using async python, and in particular aiobotocore comes with several significant drawbacks.
Proposal
An alternative to Asynchronous functions to parallelise access to cloud resources, is to use old fashioned threads. To get good S3 performance you only need to use somewhere from 10-50 parallel requests, which can easily be handled by threads. When used correctly the boto3 library is thread safe.
I think work should be put in to migrating away from
odc-aio
and using a threaded solution instead.History
This was raised in https://github.com/opendatacube/odc-tools/issues/332 but never got to the top of the priority list.