weiji14 / cryospheric-data-lakes

Big data tools to handle various cryospheric remote sensing datasets, mostly in python.
Other
3 stars 2 forks source link

Bulk download satellite data concurrently via asyncio #2

Open weiji14 opened 7 years ago

weiji14 commented 7 years ago

Use python 3.5's built-in asyncio module to concurrently bulk download satellite data from http/ftp servers.

See: Hackernoon blog post asynctio aioftp docs aiohttp docs

weiji14 commented 7 years ago

In order to be a bit nice on people's server infrastructure, and prevent FTP error 421 "Too many connections", use semaphores to limit number of simultaneous FTP connections.

Helpful examples of Python 3 implementations of Semaphores in asyncio:

Official Python3 API docs on the implementation: https://docs.python.org/3/library/asyncio-sync.html#semaphores

Note use of async with sem: syntax in Python 3.6? But earlier versions (e.g. Python 3.4/3.5 ) may use something like with (yield from sem): or with (await sem).