opendatacube / odc-tools

ODC features that DEA is experimenting with or prototyping with the intention of being integrated into odc-core in the future
Apache License 2.0
62 stars 33 forks source link

Replace Async AWS Code with Threads #522

Open omad opened 1 year ago

omad commented 1 year ago

Background

To get good performance from AWS S3, it's necessary to parallelise requests.

The odc-aio library provides functions used in the odc-tools CLI applications, and is implemented using Async Python and the aiobotocore library.

This has worked well for several years, providing good performance. However, using async python, and in particular aiobotocore comes with several significant drawbacks.

Proposal

An alternative to Asynchronous functions to parallelise access to cloud resources, is to use old fashioned threads. To get good S3 performance you only need to use somewhere from 10-50 parallel requests, which can easily be handled by threads. When used correctly the boto3 library is thread safe.

I think work should be put in to migrating away from odc-aio and using a threaded solution instead.

History

This was raised in https://github.com/opendatacube/odc-tools/issues/332 but never got to the top of the priority list.

alexgleith commented 1 year ago

There's a few examples in this repo that use threads instead and I think they work fast and fine... it's much simpler than async! For example: https://github.com/opendatacube/odc-tools/blob/develop/apps/dc_tools/odc/apps/dc_tools/esa_worldcover_to_dc.py#L185