mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

(DeviantArt - Extractor) Not downloading "AntiSocial" images (ones not in a folder) during full Gallery rip? #959

Closed John7732 closed 4 years ago

John7732 commented 4 years ago

First off, let me just say how grateful I am for all the work you put into Gallery-dl (it is an amazing tool!), and just a heads up that I am new to posting issues on GitHub so I'm sorry if I do anything wrong.

So I recently noticed that ripping an entire DeviantArt gallery was somehow still missing some of my favorite art and it seems these particular images could only be found in the "all" section and they are not in "featured" or any other folders of the gallery. Explored the meta files and noticed they were pulling "isBlocked = true" and "isAntiSocial = true" tags with "subcategory: deviation" instead of the usual "subcategory: folder" and noticed several images fitting this pattern all had no folders assigned to them. So they are not detected in a full gallery rip but of course manually downloading each individual url works fine. Adding them to my own personal favorites "collection" and ripping the collection is also another make-shift method that works.

I'm guessing right now Gallery-dl is only checking "featured" and the other gallery folders for art, so I was just wondering how I might go about getting Gallery-dl to detect the images in "all" that are not in any gallery folders during a full gallery rip? let me know if more info is needed and thank you very much for your time.

biznizz commented 4 years ago

There are several different factors that you need to download DA galleries since that site is a pain in the butt to do.

Could you post your deviantart extractor settings from your configuration file here? Just copy it and paste it here directly, then highlight it and press the "insert code" button to get the code to look like this (this example is the stock extractor settings for the example configuration files in the bin folder.)

deviantart":
        {
            "extra": false,
            "flat": true,
            "folders": false,
            "journals": "html",
            "mature": true,
            "metadata": false,
            "original": true,
            "quality": 100,
            "wait-min": 0
},

Basically, what are your current extractor settings, and can you post the gallery that you are having an issue with, for testing purposes?

John7732 commented 4 years ago

Sure! Here is the DeviantArt Extractor settings I'm using with client info and cookie file paths censored out:

   "deviantart": {
            "cookies": "*censored cookie file path*",
            "cookies-update": true,
            "disable-include": "all",
            "include": "gallery,deviations,scraps,journal,stash",
            "refresh-token": "cache",
            "client-id": "*censored value*",
            "client-secret": "*censored value*",
            "extra": true,
            "flat": false,
            "folders": false,
            "journals": "html",
            "mature": true,
            "metadata": true,
            "original": false,
            "quality": 100,
            "wait-min": 0,

            "scraps": {
                "cookies": "*censored cookie file path*",
                "cookies-update": true
            },

            "stash": {
                "cookies": "*censored cookie file path*",
                "cookies-update": true,
                "directory": [ "DeviantArt", "{author[username]}", "Sta.sh" ],
                "filename": "{title}_{index}_.{extension}"
            },

            "journal": {
                "metadata": false,
                "extra": false,
                "directory": [ "DeviantArt", "{author[username]}", "Journals" ],
                "filename": "{title}_{index}_.{extension}"
            },

            "favorite": {
                "metadata": false,
                "extra": false,
                "directory": [ "DeviantArt", "{author[username]}", "Favorites" ],
                "filename": "{title}_{index}_.{extension}"
            },

            "collection": {
                "metadata": false,
                "extra": false,
                "directory": [ "Collections", "{collection[owner]}", "{collection[title]}", "{author[username]}" ],
                "filename": "{title}_{index}_.{extension}"
            },

            "directory": [ "DeviantArt", "{author[username]}" ],
            "filename": "{title}_{index}_.{extension}"
        },

Keep in mind that most galleries are actually ripping 100% correctly and this is the first time I noticed one that was not including every single file (I think by default most Deviations are included in at least the "featured" folder unlike here so they work) and even here most files are being included correctly.

<Warning: Gallery example is mostly SFW but some art might be considered to have NSFW themes (though you usually don't actually see anything and is mostly for cute/humor)>

Hopefully they don't mind me sharing their gallery like this, but here is the one that caught my attention: "https://www.deviantart.com/vcfgr"

Here are some specific examples of Deviations not being included in the Gallery rip (there are more though): https://www.deviantart.com/vcfgr/art/Paper-Doll-164861339 https://www.deviantart.com/vcfgr/art/Box-O-Fun-59615686 https://www.deviantart.com/vcfgr/art/Bakuhatsu-31457978

biznizz commented 4 years ago

Okay, good you're using a cookies.txt file for your DA cookies, as well as the refresh-token and the client-id/client-secret from the app.

According to the configuration settings for deviantart, the settings for include only register gallery, scraps, journal and favorites as accepted values, so "deviations" and stash aren't recognized there. Any sta.sh links included in description texts or journals are automatically downloaded as submissions when extra is set to true.

I'm also unsure that you need the extra settings for scraps, stash, journal, favorite, or collection either, but that would be an answer for mikf, the dev.

Let's see... I ran gallery-dl https://www.deviantart.com/vcfgr and downloaded nearly everything. My cookies updated incorrectly after the first run while it was fetching scraps, but it got everything fine after a cookie re-export and re-run. Are your cookies updating correctly when you rip full DA galleries? And are you running the rip with the url for the user's page (like I did), or are you running the "gallery/all" URL?

John7732 commented 4 years ago

Yeah mostly was just testing/trying stuff for the includes when I noticed the meta files said "subcategory:deviation" for the missing ones, but that's good to know. Figured adding the "Extra" settings at least wouldn't hurt anything just in case it actually helped. I'll try to clean that up later.

Did your rips end up including those 3 missing Deviation examples (like the Bakuhatsu / Paperdoll ones)? Well I believe the Cookies have been updating correctly but how would you be able to tell? I have been able to rip the "Stash" (separate link not included) and "Scraps" files no problem and also already tried re-exporting the cookies and re-running without much luck regarding the missing files. Mature files are also being ripped fine so with all that working I was assuming cookies were updating. Feels like it digs and rips everything fine except it just doesn't look into the "all" section of the Gallery causing files only found there to be missed.

For the rip URLs I tried both: "https://www.deviantart.com/vcfgr" (my default go to for most galleries that seems to work best) and "https://www.deviantart.com/vcfgr/gallery/all"

Felt like the "all" url actually read less data / went quicker (but might be wrong).

biznizz commented 4 years ago

Yes, during my test rips, the three deviations you listed were ripped the first time. As said, the only issue I ran into was, midway through getting scraps, my cookies updated incorrectly, requiring a re-export and re-run, then it got the missing ones. Any other error I got were the TTF font files that were turned into txt files (which isn't anything, just rename the extensions back to .ttf and they go to normal).

Well I believe the Cookies have been updating correctly but how would you be able to tell?

The reason I use a dedicated txt file for cookies I need for any sites that require them for ripping as opposed to the pure exported txt is for this reason. It's easy to see if your DA cookies updated incorrectly by looking at the userinfo cookie.

.deviantart.com TRUE / FALSE 1562335230 userinfo __123456789123456789%3B%7B%22username%22%3A%22USERNAME%22%2C%22uniqueid%22%3A%227eabbaabbaabba1234567889aae%22%2C%22dvs9-1%22%3A1%2C%22ab%22%3A%22tao-DWA-1-a-2%22%7D

As you can see in this example, your DA username (seen in bold) is part of the cookie. When the cookies update incorrectly, the actual user name disappears from the value.

Using Notepad++ and having the cookies.txt folder open in the program will let you know when the file has had changes applied and will ask to reload. That way, you can know to look to see if the cookies updated incorrectly.

John7732 commented 4 years ago

So the cookie looked correct with the username still saying my actual username and seemed to be right format (plus it's been successfully ripping "Mature Scraps" which wouldn't work otherwise (right?). I wanted to try testing it out without the config file to hopefully isolate the issue thinking maybe the cookie was not working and called --cookie path "url" in command window and got the missing files... thought I remembered something about cookie being only for "Mature" Scraps files so tried without the cookie... and it still ripped the missing files correctly... so probably meant it was something with the config file and compared it to the default values which mine had ("flat": false) and changing it back to the default ("flat": true) which then made it work properly and rip the missing files. I guess ("flat": false) was "Collecting a list of all gallery-folders or favorites-collections" to rip from and since these were not technically in a "gallery folder" they were being missed maybe? Idk lol.

TLDR: Simply went into config file and changed ("flat": false) back to default ("flat": true) value and it works now (yay!!)

Yea I was glad to see it still ripped the TTF font files as .txt instead of just ignoring them, doing a batch convert back to TTF shouldn't be too hard. Thank you so much for your time!! I really appreciate all of your help.

biznizz commented 4 years ago

No prob, glad you figured it out.

I never really bothered messing around with the flat option myself, so I probably wouldn't have noticed it until a while comparing my extractor setting to yours.

The cookie is for scraps in general, I believe since the mature toggle is for all mature images regardless if they're scraps or not. When my cookie got corrupted in the update, it was during the rip of the scraps section, so it had started ripping scraps but stopped before getting them all. When new cookies were put in and run again, it again refreshed during the scraps rip, but didn't get corrupted. I've mostly only had cookie corruptions during update when ripping multiple whole galleries; I can't recall it happening during an individual image rip of either a deviation or a scrap.

The extractor largely only recognizes image file types for deviantart, so unfamiliar filetypes like .ttf or .psd get converted to txt and command window shows that the program didn't recognize the MIME type of the item and thus converted it to a txt. It's better than just it being skipped over.

Good luck with using the program to rip your fav galleries!