Open colin-heberling opened 1 year ago
According to the API docs, the upper limit is 50,000 posts, but there has been at least one other issue that reported not being able to download everything from a folder. Maybe DA broke something again.
What you could try is using something like --range 10000-
to have it start from offset 10000 instead of the very beginning. You can investigate DA's responses with --write-pages
. There is probably a point where it only returns an empty response with "has_more": false
, which causes gallery-dl to stop.
However, it's much easier to keep up with the latest images of an artist if I can download straight from their entire profile, so having to download from individual folders every time is non-ideal, especially when often times these artists don't properly update their individual folders, and many of their images may only be accessible within the main profile.
You could download everything from folders first and then use the main profile URL to update your collection.
Also, -o flat=false
plus main profile URL will automatically download from all folders.
The range option has turned out to be very useful in regards to finishing collections if I'm missing images in the middle, but I don't think that fixes my problem. When I've tried to start at 10,000 with the range option for artists that had ~27k or 65k images DeviantArt returns an error saying there's nothing left.
I also wrote a script for breaking up a downloading job into multiple image chunks based on user-supplied max image index, starting index, and step count per job. It seems to work perfectly for small galleries, but when I tried it on the 27k gallery I tried to break it into chunks of 5000, and it had some strange behavior. It succeeded on the first 5000, but from about 5001-6999 it had repeats of images it had already downloaded, and with the terminate option skipped these unless I tried again with a lower step count. So for my first run, it downloaded the first 5000, with a terminate limit of 10 failed to download 5001-5010, then skipped to 10,000 and returned nothing. Then it skipped to 15000, 20000, etc., returning no images for these index ranges.
Is this something that can be fixed on our end, or is this strictly DA's fault?
Also, it turns out that some of these artists with massive amounts of images have not organized every single image into folders, some can only be accessed from the artist's main gallery. I contacted the artist that I tested on, and they said they were at their technical limit for number of folders, which is why the majority of their newest images had been uploaded to their main gallery only.
Is this something that can be fixed on our end, or is this strictly DA's fault?
--range
not properly working all the time is probably gallery-dl's fault, but the general problem of not being able to download more than 10k images per folder/collection is on DA and I don't think that can be fixed.
You can try scraping archive.org for links related to the account you want. It's best to use the CDX api to get a list of everything. It seems deviantart urls always have the username in the url either before or after the domain name. Accounts years ago were like https://username.deviantart.com/ rather than the current https://www.deviantart.com/username/ so check for both.
Here's the cdx api and how to use it: https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md
I opened an issue for the DeviantArt API related to this on DA's github page, but nobody's responded yet. I'll have to try the archive.org approach when I get time, I hope it's not too bad.
I'm having the same issue. The artist I'm trying to download from has nearly 23k deviations and it stopped after around 10k have been downloaded. If any new ones are uploaded, they download just fine but nothing below that point.
gallery-dl https://www.deviantart.com/ChloeDH1001/ -d DRAW002
gallery-dl https://www.deviantart.com/ChloeDH1001/gallery/all -d DRAW002
gallery-dl https://www.deviantart.com/ChloeDH1001/gallery/all -d DRAW002 --range 10024-
gallery-dl https://www.deviantart.com/ChloeDH1001/gallery/all -d DRAW002 --range 10024- -o flat=false
The last command yielded unique results that haven't been downloaded yet, but just say that there are no results for the URL.
10k working entries, are they (roughly) based on age? Or, asked differently, do the older ones of the alleged 23k deviations work in the browser?
On my end, from both phone and a [very old and temporary] Windows 7 laptop, it's very hard to tell. I don't know if it's an OS issue, but on Windows 10, the downloaded files get the date modified attribute set to the upload date of when the artist uploaded the image; that doesn't happen on Windows 7 and I can't seem to sort out the files by date properly both on said old laptop and DeviantArt web client (Vivaldi, also the latest release available for Windows 7 (typically not the latest version of the browser overall as the newest version is not supported)).
gallery-dl https://www.deviantart.com/ChloeDH1001/gallery/all -d DRAW002 --range 10024- -o flat=false
The last command yielded unique results that haven't been downloaded yet, but just say that there are no results for the URL.
Run this last command without --range
. This will fetch deviations from each individual folder instead of /all
and not start from offset 10024
for each one of them.
gallery-dl https://www.deviantart.com/ChloeDH1001/gallery/all -d DRAW002 -o flat=false
that doesn't happen on Windows 7
Try --mtime date
. Maybe that works, although it uses a different mtime value than the default settings would.
Run this last command without
--range
. This will fetch deviations from each individual folder instead of/all
and not start from offset10024
for each one of them.gallery-dl https://www.deviantart.com/ChloeDH1001/gallery/all -d DRAW002 -o flat=false
That did download new stuff, but only downloaded 8678 files, compared to the full ~23K files. A total of 18,988 files downloaded; 8678 new, 10,219 already downloaded. I also have a problem of having duplicate files now, since I can't find a way for gallery-dl to check only for new files, since it has to go through the entire list to see what has been downloaded before downloading any new files.
Is there a maximum number of images that can be downloaded in a single session from a single artist or folder? I've noticed that for artists with large galleries over 10k or so images that I can't seem to ever reach 10k downloaded files, it stops around 9.5k or so. But if I download from individual folders on that artist's profile, provided those folders have <9.5k images, I can download everything just fine. However, it's much easier to keep up with the latest images of an artist if I can download straight from their entire profile, so having to download from individual folders every time is non-ideal, especially when often times these artists don't properly update their individual folders, and many of their images may only be accessible within the main profile.