mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
12.04k stars 979 forks source link

deviantart download limit from single artist/folder? #4665

Open colin-heberling opened 1 year ago

colin-heberling commented 1 year ago

Is there a maximum number of images that can be downloaded in a single session from a single artist or folder? I've noticed that for artists with large galleries over 10k or so images that I can't seem to ever reach 10k downloaded files, it stops around 9.5k or so. But if I download from individual folders on that artist's profile, provided those folders have <9.5k images, I can download everything just fine. However, it's much easier to keep up with the latest images of an artist if I can download straight from their entire profile, so having to download from individual folders every time is non-ideal, especially when often times these artists don't properly update their individual folders, and many of their images may only be accessible within the main profile.

mikf commented 1 year ago

According to the API docs, the upper limit is 50,000 posts, but there has been at least one other issue that reported not being able to download everything from a folder. Maybe DA broke something again.

What you could try is using something like --range 10000- to have it start from offset 10000 instead of the very beginning. You can investigate DA's responses with --write-pages. There is probably a point where it only returns an empty response with "has_more": false, which causes gallery-dl to stop.

However, it's much easier to keep up with the latest images of an artist if I can download straight from their entire profile, so having to download from individual folders every time is non-ideal, especially when often times these artists don't properly update their individual folders, and many of their images may only be accessible within the main profile.

You could download everything from folders first and then use the main profile URL to update your collection.

Also, -o flat=false plus main profile URL will automatically download from all folders.

colin-heberling commented 1 year ago

The range option has turned out to be very useful in regards to finishing collections if I'm missing images in the middle, but I don't think that fixes my problem. When I've tried to start at 10,000 with the range option for artists that had ~27k or 65k images DeviantArt returns an error saying there's nothing left.

I also wrote a script for breaking up a downloading job into multiple image chunks based on user-supplied max image index, starting index, and step count per job. It seems to work perfectly for small galleries, but when I tried it on the 27k gallery I tried to break it into chunks of 5000, and it had some strange behavior. It succeeded on the first 5000, but from about 5001-6999 it had repeats of images it had already downloaded, and with the terminate option skipped these unless I tried again with a lower step count. So for my first run, it downloaded the first 5000, with a terminate limit of 10 failed to download 5001-5010, then skipped to 10,000 and returned nothing. Then it skipped to 15000, 20000, etc., returning no images for these index ranges.

Is this something that can be fixed on our end, or is this strictly DA's fault?

Also, it turns out that some of these artists with massive amounts of images have not organized every single image into folders, some can only be accessed from the artist's main gallery. I contacted the artist that I tested on, and they said they were at their technical limit for number of folders, which is why the majority of their newest images had been uploaded to their main gallery only.

mikf commented 1 year ago

Is this something that can be fixed on our end, or is this strictly DA's fault?

--range not properly working all the time is probably gallery-dl's fault, but the general problem of not being able to download more than 10k images per folder/collection is on DA and I don't think that can be fixed.

Twi-Hard commented 1 year ago

You can try scraping archive.org for links related to the account you want. It's best to use the CDX api to get a list of everything. It seems deviantart urls always have the username in the url either before or after the domain name. Accounts years ago were like https://username.deviantart.com/ rather than the current https://www.deviantart.com/username/ so check for both.

Here's the cdx api and how to use it: https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md

colin-heberling commented 1 year ago

I opened an issue for the DeviantArt API related to this on DA's github page, but nobody's responded yet. I'll have to try the archive.org approach when I get time, I hope it's not too bad.

StickyChannel92 commented 1 week ago

I'm having the same issue. The artist I'm trying to download from has nearly 23k deviations and it stopped after around 10k have been downloaded. If any new ones are uploaded, they download just fine but nothing below that point.

gallery-dl https://www.deviantart.com/ChloeDH1001/ -d DRAW002 gallery-dl https://www.deviantart.com/ChloeDH1001/gallery/all -d DRAW002 gallery-dl https://www.deviantart.com/ChloeDH1001/gallery/all -d DRAW002 --range 10024- gallery-dl https://www.deviantart.com/ChloeDH1001/gallery/all -d DRAW002 --range 10024- -o flat=false The last command yielded unique results that haven't been downloaded yet, but just say that there are no results for the URL.

Hrxn commented 1 week ago

10k working entries, are they (roughly) based on age? Or, asked differently, do the older ones of the alleged 23k deviations work in the browser?

StickyChannel92 commented 1 week ago

On my end, from both phone and a [very old and temporary] Windows 7 laptop, it's very hard to tell. I don't know if it's an OS issue, but on Windows 10, the downloaded files get the date modified attribute set to the upload date of when the artist uploaded the image; that doesn't happen on Windows 7 and I can't seem to sort out the files by date properly both on said old laptop and DeviantArt web client (Vivaldi, also the latest release available for Windows 7 (typically not the latest version of the browser overall as the newest version is not supported)).

mikf commented 1 week ago

gallery-dl https://www.deviantart.com/ChloeDH1001/gallery/all -d DRAW002 --range 10024- -o flat=false The last command yielded unique results that haven't been downloaded yet, but just say that there are no results for the URL.

Run this last command without --range. This will fetch deviations from each individual folder instead of /all and not start from offset 10024 for each one of them.

gallery-dl https://www.deviantart.com/ChloeDH1001/gallery/all -d DRAW002 -o flat=false

that doesn't happen on Windows 7

Try --mtime date. Maybe that works, although it uses a different mtime value than the default settings would.

StickyChannel92 commented 19 hours ago

Run this last command without --range. This will fetch deviations from each individual folder instead of /all and not start from offset 10024 for each one of them.

gallery-dl https://www.deviantart.com/ChloeDH1001/gallery/all -d DRAW002 -o flat=false

That did download new stuff, but only downloaded 8678 files, compared to the full ~23K files. A total of 18,988 files downloaded; 8678 new, 10,219 already downloaded. I also have a problem of having duplicate files now, since I can't find a way for gallery-dl to check only for new files, since it has to go through the entire list to see what has been downloaded before downloading any new files.