Closed TACIXAT closed 9 months ago
Unfortunately this feature is not yet supported.
Tracked under:
axle can download a list of urls. It can download a single file in parallel so I suspect it will download multiple urls in parallel
Some other options c/o chatgpt
GNU Parallel: This is a powerful tool for running jobs in parallel. You can use it to run multiple wget commands at once. Here's a basic example:
cat urls.txt | parallel -j 10 wget
. This command reads URLs from a file urls.txt and uses GNU Parallel to run 10 wget jobs simultaneously.xargs: Another option is to use xargs with the -P flag for parallel execution. For example:
cat urls.txt | xargs -n 1 -P 10 wget
. This runs wget for each URL in urls.txt, with up to 10 downloads in parallel.
Thanks for these. I am a Windows user and a bit of a command prompt purist these days (to force myself to learn).
I usually just implement scraping and downloaders in Python. Been meaning to throw together a parallel downloader in Go. Would be a cool addition here, I'll give this a try next time I am scraping.
On Mon, Nov 13, 2023 at 10:43 PM Brian Low @.***> wrote:
axle https://github.com/axel-download-accelerator/axel can download a list of urls. It can download a single file in parallel so I suspect it will download multiple urls in parallel
Some other options c/o chatgpt
GNU Parallel: This is a powerful tool for running jobs in parallel. You can use it to run multiple wget commands at once. Here's a basic example: cat urls.txt | parallel -j 10 wget. This command reads URLs from a file urls.txt and uses GNU Parallel to run 10 wget jobs simultaneously.
xargs: Another option is to use xargs with the -P flag for parallel execution. For example: cat urls.txt | xargs -n 1 -P 10 wget. This runs wget for each URL in urls.txt, with up to 10 downloads in parallel.
— Reply to this email directly, view it on GitHub https://github.com/philippta/flyscrape/issues/5#issuecomment-1809629839, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHP4RVNHPS6FPIQRJFITFDYEMHI3AVCNFSM6AAAAAA7HZKC4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBZGYZDSOBTHE . You are receiving this because you authored the thread.Message ID: @.***>
File downloads have been added.
Example: https://github.com/philippta/flyscrape/blob/master/examples/download.js API Reference: https://github.com/philippta/flyscrape#file-downloads
Can this be used for downloading files in parallel?
For example, if I wanted to download 400 gb of image embeddings from - https://deploy.laion.ai/8f83b608504d46bb81708ec86e912220/embeddings/img_emb/