jjjake / internetarchive

A Python and Command-Line Interface to Archive.org
GNU Affero General Public License v3.0
1.58k stars 217 forks source link

multithreaded concurrent downloads? #622

Open Betonhaus opened 8 months ago

Betonhaus commented 8 months ago

Would it be possible to add the ability to download multiple files simultaneously when doing bulk downloads? So if one thread is stuck on a really big and/or slow file the other threads can knock through the other downloads while the one thread is chugging away? Downloading all the files at once using xargs will bog down the system, but if there was a way to define a maximum number of threads so it only downloads 10 files at once instead of making 1,000 unique threads would be much faster.

JustAnotherArchivist commented 8 months ago

Duplicate of #412

Some implementations of xargs (e.g. GNU's) have a -P option to specify the concurrency. There's also GNU Parallel and others.

Betonhaus commented 8 months ago

is there a way to get ia to lock the file it's downloading? when downloading an archive if it's going too slow I'll open up additional tabs and download parts (eg: when downloading get-smart I started the initial download then later started ones that only downloaded one season at a time)

Eventually the first download will catch up to the other downloads, skip the files that they downloaded, then start downloading the exact same file that a different thread is downloading. Ideally it should see that specific file is being downloaded then skip to the next one. cancelling one thread causes the file to get deleted/corrupted

vxbinaca commented 6 months ago

try "parallel" if you're doing multiple items, but if it's a single item you're SOL