mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.39k stars 930 forks source link

Deviantart: No longer dowloads full-size/original image for non-downloadable art #4652

Open spectrefps opened 11 months ago

spectrefps commented 11 months ago

This is mostly outdated now (basically everything in italics), see 'UPDATE 3' within this post at the bottom for the most recent status. It looks like DeviantArt has blocked gallery-dl form downloading full-size/original images for now.

For this artwork on DA (https://www.deviantart.com/riikozor/art/Commission-Nazek-340156337), it tries the URL (increasing delay after each attempt), and then produces the '403 forbidden' warning and resorts to a fallback-URL which results in a smaller version of the image.

I remember getting the '403 and fallback-URL' issue on only a few URLs before the recent update, and since then, it's been working like a charm on all of them until this one. Maybe this is one of the last 'problem URLs' for Deviantart? (as long as they don't go and change up everything on their end again) XD

UPDATE: Within the same gallery, this artwork (https://www.deviantart.com/riikozor/art/Draenei-466352463) also produces the issue. However, other artworks within the same gallery download fine.

This one also: https://www.deviantart.com/kimisz/art/Commission-Draenei-Death-Knight-327645028

It may just be coincidence, but this is seeming to occur on older artworks (2012 or earlier). Maybe they are using a different (perhaps 'legacy') image URL format or syntax for older images like these (before they changed stuff on their backed for newer images/artworks)?

UPDATE 2: As Twi-Hard posted, this issue appears to occur on most DA pages now (that lack the 'Download' button). DA changed something on their end again? I will list some of the links of artworks this issue now occurs on.

https://www.deviantart.com/pocketstash/art/C-Draenei-881116926 https://www.deviantart.com/kirshivanilla/art/Ralakia-Bust-up-Portrait-944608016 https://www.deviantart.com/pufikn/art/Draenei-commission-847056249 https://www.deviantart.com/pufikn/art/Commission-897446832 https://www.deviantart.com/pufikn/art/draenei-827271193

UPDATE 3 10/20/2023

It looks like you can no longer download any full-size images for non-downloadable art (those that lack the "Download" arrow/button) anymore. After trying all of the suggestions in this topic, it only returns the preview-size art (a fraction of the original/full dimensions).

Twi-Hard commented 11 months ago

I've tried it on many accounts. No non-downloadable images can be obtained in full res anymore. I've see this happen with images that are non-downloadable but have free downloads too. Example of non-downloadable images that have free downloads (very nsfw): https://www.deviantart.com/twilightsparklelewds I was able to get NSFW stuff yesterday.

spectrefps commented 11 months ago

I've tried it on many accounts. No non-downloadable images can be obtained in full res anymore. I've see this happen with images that are non-downloadable but have free downloads too. Example of non-downloadable images that have free downloads (very nsfw): https://www.deviantart.com/twilightsparklelewds I was able to get NSFW stuff yesterday.

I've been using it exclusivley on SFW/NSFW non-downloadable images (the ones that don't have the "Download" arrow/button on the page) for hours now today and it has been working fine so far. I've only ran into this issue on the few artworks I've listed above. I am seeing a pattern though, each of them are relatively old (from 2012 or earlier). Non-downloadable images from more recent have been working fine for me today. Maybe the older URLs have a different format or syntax (or their system handles the process differently for old artworks)?

spectrefps commented 11 months ago

Scratch that, I had just used it earlier today without issue, and now it gives the 403 error/fallback URL for every URL as you said. I will update the main post to reflect this recent change of behavior.

a-washing-machine commented 11 months ago

I've been updating my gallery folder for the past 5 days, and only today am I starting to get a WALL of warnings as described above. It was still fine at like 2am, not anymore at ~4pm.

Sigh, and I was almost through with updating it too, especially with gallery-dl downloading from deviantArt so much faster all of a sudden! Ah well. -_-

spectrefps commented 11 months ago

I've been updating my gallery folder for the past 5 days, and only today am I starting to get a WALL of warnings as described above. It was still fine at like 2am, not anymore at ~4pm.

Sigh, and I was almost through with updating it too, especially with gallery-dl downloading from deviantArt so much faster all of a sudden! Ah well. -_-

Ug you too? Yeah I still had some OLD art from DA that I was hoping to update (mostly WoW Draenei art) that was the compressed low-res "-fullview" version (long before I ever heard of gallery-dl, back in 2014-2015 XD). And I had used it just a few hours ago with no issues. Now suddenly many arts (strangely, not -all- of them) are giving the 403 error and resorting to the low-res fallback URL.

mdashlw commented 11 months ago

oh man. can confirm with downloadable (https://www.deviantart.com/starsbursts/art/Twi-cereal-987543936) and non-downloadable images (https://www.deviantart.com/buvanybu/art/Rainbow-Dash-987316641) see #4548 for additional context

Ironchest337 commented 11 months ago

It'll likely be a long time before another way is found for non-downloadable images. That original method was a lucky slip in the cracks that went unnoticed for nearly 2 years. With them finding one it was only a matter of time before they found the next that used the exact same method. It was also the easiest method to implement by far with others being inapplicable or being an extremely tedious process.

mdashlw commented 11 months ago

It'll likely be a long time before another way is found for non-downloadable images. That original method was a lucky slip in the cracks that went unnoticed for nearly 2 years. With them finding one it was only a matter of time before they found the next that used the exact same method. It was also the easiest method to implement by far with others being inapplicable or being an extremely tedious process.

Could you elaborate on "others"? Is there any way at all to download some non-downloadable images right now?

Ironchest337 commented 11 months ago

Could you elaborate on "others"? Is there any way at all to download some non-downloadable images right now?

As far as I can tel: No. We cannot get non-downloadable images right now. Full-res ones at least

I can't name them all the JWT vulnerabilities off the top of my head but here are some I remember: 1) Passing a key without a signature at all (Somewhat different from the none algorithm I believe). Doesn’t work as DA checks the whole token 2) Swapping the key type passed in. Only works when the original key is asymmetrical with a public and private key and the altered key is of a symmetrical pairing. HS256 is symmetrical so it does not work 3) Header Injection or other header vulnerabilities. Essentially requires certain headers to exist in the JWT that can be modified for code execution instead or to access a different file path. I'm not sure of all the details but since the JWT's used by Deviantart do not have the headers, it might also be the case that even if you add them Deviantart has no reason to check them and will simply ignore them. 4) Brute force. Pretty self explanatory and really the only method we have at this point. Only viable if Deviantart uses a weak signing key. Can confirm that it does not use any of the common of default keys so it would be a long time before something turns up

For these reasons its probably better to find a something with the website itself as opposed to the tokens.

oxi7589 commented 11 months ago

Could you elaborate on "others"? Is there any way at all to download some non-downloadable images right now?

Yes, there is such a way, at least for some cases. NSFW images for which there is a "Free download" option available can usually be downloaded using the private token (gallery-dl -o public=false {url}, assuming client-id and client-secret are present in the config and OAuth authorization has been performed). Bear in mind that the quota for that is very low, and you will likely face issues when using it to download entire large galleries.

Twi-Hard commented 11 months ago

Something I've been thinking.. considering free downloads died at the exact same time as full res images they might be related? Fixing free downloads might indirectly fix full resolution? I was downloading both types of content when it stopped working and that was the first time the fallback url showed up in my logs.

Ironchest337 commented 11 months ago

Something I've been thinking.. considering free downloads died at the exact same time as full res images they might be related? Fixing free downloads might indirectly fix full resolution? I was downloading both types of content when it stopped working and that was the first time the fallback url showed up in my logs.

I would love if that was the case but I don't think so. I always used my own program that simply manipulated the deviantart image link into what I needed to get an image, and considering that's no longer working and is the core part of how gallery-dl's extraction works, it's probably been patched out. With regards to non-downloadable images / anything trying to be obtained using the none algorithm.

mikf commented 11 months ago

To get better results with gallery-dl than with default settings:

sbobbo commented 11 months ago

Is there a way to get it to skip these pictures rather than download the blurry thumbnail?

spectrefps commented 11 months ago

Owner

I'm going through the "refresh-token" link and trying to make sense of it. If I understand it correctly, I need the OAuth part as well? And then I need to link it to my DeviantArt account and add a value it provides into my config file somewhere. I do have a cookies field in my config file (in the DeviantArt section) with a cookie key/value pair, but perhaps they expired by now and need to be updated/re-exported?

Update: I checked just now and the cookie value for "auth_secure" (what I'm guessing is the cookie label for DeviantArt) is the same as what I already have in the config file (expires somtime in 2024 according to the table, so that makes sense I guess). Still not sure how to get the refresh-token part to work. I will try to get OAuth working and see if it makes any sense.

spectrefps commented 11 months ago

UPDATE 2: Got the refresh-token + OAuth (apparently) working, added the "refresh-token" and "jwt" lines/settings in the Deviantart portion of the config file. Now, it downloads the image without the 403 error+fallback url warning. However, the images downloaded are either:

A) smaller dimensions & filesize than the full size original (versus the original's listed dimensions/filesize below the art description),

--or--

B) the correct dimensions, but different file size (so far, a larger file size than the original). For case B, I am not sure why the file downloaded is larger than the original. If anything, wouldn't DeviantArt try to force the user to get a compressed/smaller copy?

This behavior seems to vary from image to image, and I am not sure what is triggering behavior "A" for some images and behavior "B" for others.

spectrefps commented 11 months ago

Has anyone been able to identify the reason why some artworks download in the full dimensions (but larger file size) while others only download a fraction of full dimensions? At first I thought that there was something to do with the original image's dimensions (larger ones specifically), but that wound up not always being the case (some larger images still downloaded with the full dimensions).

sbobbo commented 11 months ago

You're getting some to download without the weird fuzzy filter over the whole photo?

spectrefps commented 11 months ago

I don't think there is a filter on them, but it may be subtle enough that I just don't see it. I did notice that the downloaded images (at least, the ones that are the same dimensions as the original) are a larger file size than the file size listed on the art's page. Other arts that I try (using the method recommened by mikf above) simply download an image with smaller dimensions (bigger than the default 'preview' version, but smaller than the full-size original).

mdashlw commented 11 months ago

Has anyone been able to identify the reason why some artworks download in the full dimensions (but larger file size) while others only download a fraction of full dimensions? At first I thought that there was something to do with the original image's dimensions (larger ones specifically), but that wound up not always being the case (some larger images still downloaded with the full dimensions).

Some older (older than March 2019) non-downloadable images are still able to be downloaded in higher resolution (higher than preview but not original)

colin-heberling commented 11 months ago

Could you elaborate on "others"? Is there any way at all to download some non-downloadable images right now?

Yes, there is such a way, at least for some cases. NSFW images for which there is a "Free download" option available can usually be downloaded using the private token (gallery-dl -o public=false {url}, assuming client-id and client-secret are present in the config and OAuth authorization has been performed). Bear in mind that the quota for that is very low, and you will likely face issues when using it to download entire large galleries.

This worked for me. I'm still getting the "[downloader.http][warning] '403 Forbidden' for..." warning for nearly every file, but strangely, simply adding the "-o 'public=false'" option started downloading everything at the normal resolution again, and for an artist where I was subscribed, it actually used my credentials to download the nsfw images instead of blurry ones, as before. I haven't tested for an artist where I am not subscribed yet, but I anticipate it will still download blurry images.

Nephiro commented 11 months ago

Could you elaborate on "others"? Is there any way at all to download some non-downloadable images right now?

Yes, there is such a way, at least for some cases. NSFW images for which there is a "Free download" option available can usually be downloaded using the private token (gallery-dl -o public=false {url}, assuming client-id and client-secret are present in the config and OAuth authorization has been performed). Bear in mind that the quota for that is very low, and you will likely face issues when using it to download entire large galleries.

This worked for me. I'm still getting the "[downloader.http][warning] '403 Forbidden' for..." warning for nearly every file, but strangely, simply adding the "-o 'public=false'" option started downloading everything at the normal resolution again, and for an artist where I was subscribed, it actually used my credentials to download the nsfw images instead of blurry ones, as before. I haven't tested for an artist where I am not subscribed yet, but I anticipate it will still download blurry images.

I have everything correctly setted (client-id, client-secret, etc), but even adding public=false doesn't change anything for me. This is my example (https://www.deviantart.com/axlhearts/art/Nurse-GF-Zoey-988437631). It still download just the preview.

spectrefps commented 11 months ago

Same, looks like downloading full-size art is blocked by DeviantArt at the moment. Every non-download art (missing the Download button) now seems to download only the preview size.

left1000 commented 11 months ago

Ugh, even worse than not working, it's downloading all the previews of all the non-downloadable art, for all the deviations that I already have the full image of, and my archive should already be warning it not to redownload.... but since the preview is different (worse) than the image I already have it redownloads it, leaving me to now have to manually delete 100s or 1000s of images I didn't want :(

a-washing-machine commented 3 months ago

Ugh, even worse than not working, it's downloading all the previews of all the non-downloadable art, for all the deviations that I already have the full image of, and my archive should already be warning it not to redownload.... but since the preview is different (worse) than the image I already have it redownloads it, leaving me to now have to manually delete 100s or 1000s of images I didn't want :(

@left1000 Did you ever find a workaround for that? I'd like to update my gallery-downloads on deviantArt, but I'd want to avoid the situation you described; I don't want to clutter my downloads with unwanted lower-resolution duplicates for images I already have, yet I don't want to miss out on new downloads even if the only version available currently is low res.

Also, be aware it's possible via config to rename the low-res files to something like deviantart_123456789_LOWRES_artworktitle.jpg to keep them separate (and plan ahead for a possible HD fix years from now), see #4770.

It's just that this doesn't solve the clutter issue for images I already have in HD. :-/

left1000 commented 3 months ago

gallery-dl stopped downloading the blurry previews many versions ago, not sure what merge it was that changed that behavior

a-washing-machine commented 3 months ago

I think we're talking about two different things here.

Ugh, even worse than not working, it's downloading all the previews of all the non-downloadable art, for all the deviations that I already have the full image of, and my archive should already be warning it not to redownload.... but since the preview is different (worse) than the image I already have it redownloads it, leaving me to now have to manually delete 100s or 1000s of images I didn't want :(

^^ This is what I'm trying to avoid. I don't want gallery-dl to re-download smaller versions of images I already have in HD.

I'm guessing the filename/extension for previews and HD images was/is different, otherwise it wouldn't be re-downloading them since it'd just go "file by that name already exists".

 

 

...Though unfortunately, I cannot rely on "file by that name already exists" to prevent duplicate downloads, because I've set up my config to add an extra infix to the filename whenever gallery-dl detects it isn't getting the HD image (yes it could already do that, it's described here).

So, say, deviantart_123456789_artworktitle.jpg becomes deviantart_123456789_LOWRES_artworktitle.jpg if it isn't getting the HD version. My reason for this is that, should HD download ever become possible again (even years from now), that I then can simply find all LOW_RES images and redownload only those in HD.

This however causes gallery-dl to download LOWRES duplicates of files I already have.

So... how do you deal with that to prevent having to manually delete unwanted duplicates like that? Ideally, I wouldn't want gallery-dl to even download them to begin with.

Twi-Hard commented 3 months ago

What I did was ask ChatGPT to make a script that goes through the metadata jsons, extracts the values necessary to create the archive file then create the archive file. The same can probably be done with folder/file names. If you go to make one, the data needs to go into the column "entry" with a "unique" constraint in the table "archive". The data entered needs account for both the "archive-prefix" and "archive-format" which can be found with gallery-dl URL -E. I haven't said anything sooner because I didn't think what I'd have to say would be much help but hopefully it can in some way. I use a custom "archive-prefix" and "archive-format" so this script would need editing and it assumes you already have the metadata jsons.

import json
import os
import sqlite3

# Define the path to the directory and the SQLite database
directory_path = '/path/to/deviantart/folder'
database_path = '/path/to/deviantart/archive.sqlite3'

print(f"Connecting to the SQLite database at {database_path}...")
# Connect to the SQLite database
conn = sqlite3.connect(database_path)
cursor = conn.cursor()

print("Ensuring the 'archive' table exists...")
# Create the archive table if it does not exist
cursor.execute('''
CREATE TABLE IF NOT EXISTS archive (
    entry TEXT UNIQUE
)
''')

# Function to recursively traverse the directory and process JSON files
def process_directory(path):
    for root, dirs, files in os.walk(path):
        print(f"Processing directory: {root}")
        for file in files:
            if file.endswith('.json'):
                file_path = os.path.join(root, file)
                #print(f"Found JSON file: {file_path}")
                try:
                    with open(file_path, 'r') as json_file:
                        file_contents = json_file.read().strip()
                        if not file_contents:
                            print("The file is empty. Skipping...")
                            continue
                        data = json.loads(file_contents)
                        author_userid = data.get('author', {}).get('userid')
                        author_username = data.get('author', {}).get('username')
                        deviationid = data.get('deviationid', data.get('statusid'))  # Use statusid if deviationid is missing
                        if author_userid and author_username and deviationid:
                            entry = f"deviantart.com, {author_userid}, {author_username}, {deviationid}"
                            print(f"Extracted entry: {entry}")
                            try:
                                cursor.execute('INSERT INTO archive (entry) VALUES (?) ON CONFLICT DO NOTHING', (entry,))
                                conn.commit()  # Commit after each insert
                                #print("Entry added to the database.")
                            except sqlite3.IntegrityError:
                                print("Entry already exists in the database. Skipping...")
                        else:
                            print("Incomplete data found in JSON. Skipping this file...")
                except json.JSONDecodeError as e:
                    print(f"Error decoding JSON in file {file_path}: {e}. Skipping this file...")

# Process the directory
process_directory(directory_path)

# Close the database connection
print("Closing the database connection...")
conn.close()
print("Script execution completed.")