ross / requests-futures

Asynchronous Python HTTP Requests for Humans using Futures
Other
2.11k stars 152 forks source link

Efficiently download files asynchronously with requests #54

Closed khavishbhundoo closed 6 years ago

khavishbhundoo commented 6 years ago

I have just started with python and my first project involve requests and request-future.It will be help if someone could help me.

Details : https://stackoverflow.com/questions/48628510/efficiently-download-files-asynchronously-with-requests

Thanks

ross commented 6 years ago

Hi @khavishbhundoo the README has some examples on how to use the library in general. If the files are small it will work really well just following that pattern and writing the response contents out to files once they're finished.

If the files are large however you'll be outside of the designed use-case of requests futures which is intended more for API/web requests with relatively small payloads that fit in memory. In this case you'd probably be better off writing something a bit custom. I threw the following together to test and see if my first thought at what that might look like would work and it seems to:

#!/usr/bin/env python

from concurrent.futures import ThreadPoolExecutor
from requests import Session
from time import time

url = 'https://github.com/django/django/archive/2.0.2.tar.gz'
executor = ThreadPoolExecutor(max_workers=4)
session = Session()

def log(msg):
    print('{}: {}'.format(time(), msg))

def download(url, filename):
    log('request {}'.format(filename))
    resp = session.get(url, stream=True)
    resp.raise_for_status()
    with open(filename, 'wb') as fh:
        for chunk in resp.iter_content(chunk_size=1024):
            log('chunk {}'.format(filename))
            fh.write(chunk)
    log('done {}'.format(filename))
    return filename

log('begin')
futures = []
for i in range(4):
    log('start {}'.format(i))
    future = executor.submit(download, url, '/tmp/download-{}.tar.gz'.format(i))
    futures.append(future)
filenames = [f.result() for f in futures]
log('end')

That's really noisy since it shows every 1024 byte chunk coming down, but the intention was to show that multiple files are streaming at once and complete asynchronously

...
1518398849.72: chunk /tmp/download-1.tar.gz
1518398849.72: chunk /tmp/download-3.tar.gz
1518398849.72: chunk /tmp/download-3.tar.gz
1518398849.72: chunk /tmp/download-3.tar.gz
1518398849.75: chunk /tmp/download-2.tar.gz
1518398849.75: chunk /tmp/download-3.tar.gz
1518398849.75: chunk /tmp/download-2.tar.gz
1518398849.75: chunk /tmp/download-3.tar.gz
1518398849.75: chunk /tmp/download-2.tar.gz
1518398849.75: chunk /tmp/download-2.tar.gz
1518398849.75: chunk /tmp/download-2.tar.gz
 1518398849.75: chunk /tmp/download-3.tar.gz
1518398849.75: chunk /tmp/download-3.tar.gz
1518398849.75: chunk /tmp/download-3.tar.gz
...