mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.85k stars 975 forks source link

[deviantArt] [bug workaround] Handle low res images differently until / in case that full res download becomes viable again #4770

Open a-washing-machine opened 1 year ago

a-washing-machine commented 1 year ago

OR: If HD image cannot be downloaded with current means, add suffix to filename to avoid filename collision should HD image download become possible again in the future


Regarding deviantArt closing the loophole that allowed downloading "non-downloadable" images in high resolution;

I'd like to request the option to, for all files where full resolution downloads are currently impossible due to recent changes made by DA, to have gallery-dl add a suffix to the filename to indicate the file is a low resolution fallback download.

Something like "deviantart_123456789_Some Artwork Title_LOW_RES_SUFFIX.jpg" that is easily findable. Optional via config file, I suppose.

This way, should full res downloads somehow become possible again at some point in the future --- even years from now --- there will be no filename collision between full res and low res downloads, and any missing HD images can be added into your existing download folder by simply re-parsing the galleries in your download queue without the abort-parameter.

I'd rather have the low res version of images on my hard drive right now and sort them out when a higher resolution image becomes available later, then to risk artworks being deleted before I can download them.

But I also want to avoid being unable to tell low-res and full res files apart, like what happened to me here: https://github.com/mikf/gallery-dl/issues/2846

( ...I never got around to doing that full reparse, it just would have taken up too much space AND taken too long. -_- )

Let me say it this way: I've already got tools I can use to sort out "which of these two images with the same deviantArt ID has the higher image resolution", and do it recursively for a folder-hierarchy ... as long as I °have° both low res and high res images on my hard drive, and any presumed "lower resolution" images have a consistent, fixed, easily findable filename suffix that isn't likely to cause false positives with any artwork titles. ;)


Furthermore, it would be sensible to prevent downloading the low res file if the full res file is already on the system.

As pointed out here https://github.com/mikf/gallery-dl/issues/4652#issuecomment-1773921752 , that's what's happening with the current version of gallery-dl, as low-res fallback preview images are often (but not always!) JPG files and the full resolution images already downloaded prior may not be. This does lead to unwanted clutter.

Now I don't mind too much if some images get downloaded twice if an artist changes an artwork's title and thus avoids filename collision with existing files, that doesn't seem to happen as much as you'd think.

But it would be sensible to include a clause "if (downloading a low-res fallback preview image) => check if file of same filename (without the added suffix!) but different file extension (JPG, PNG, GIF) already exists." If yes, skip.

(The exception to this is if the file on the hard drive is a non-image file, then sure, by all means do download the preview file too, I'm all for downloading both in non-image cases.)


In short:

If "deviantart_123456789_Some Artwork Title.png" already exists, DON'T download "deviantart_123456789_Some Artwork Title_LOW_RES_SUFFIX.jpg".

But if "deviantart_123456789_Some Artwork Title_LOW_RES_SUFFIX.jpg" exists, it SHOULD download "deviantart_123456789_Some Artwork Title.png" in the future should high res download become possible again.


Apologies for being a bit wordy, can't think of how to compress it further. :/

mikf commented 1 year ago

Short and incomplete answer for now: There is an is_original metadata field (#4559) that can be used in conditional filenames / directories to distinguish between low/full res images.

"filename": {
    "is_original": "filename",
    ""           : "LOW_RES filename"
}
Corrupt-Specturion commented 1 year ago

Short and incomplete answer for now: There is an is_original metadata field (#4559) that can be used in conditional filenames / directories to distinguish between low/full res images.

"filename": {
    "is_original": "filename",
    ""           : "LOW_RES filename"
}

Where do I put this / what do I do with this in config.json?

I tried

  "deviantart":
   {
       "filename":
       {
        "is_original": "filename",
        ""           : "LOW_RES filename"
       },
   },

But that results in the image file being named filename, and putting it in the postprocessors section

   "deviantart":
   {
       "postprocessors":
       [
           {
               "name": "metadata",
               "event": "post,skip",
               "filename": "{index}.json"
           }
           {
               "filename":
               {
                "is_original": "filename",
                ""           : "LOW_RES filename"
       },
           }
       ]
   },

gives [config][error] JSONDecodeError when loading '/home/specturion/.config/gallery-dl/config.json': Expecting ',' delimiter: line 121 column 17 (char 3108).

I looked at #4559 but it is unclear.

Hrxn commented 1 year ago

Well, it's not literally filename - that was just an example.

The default filename is "{category}_{index}_{title}.{extension}"

So use this:

{
    "extractor":
    {
         "deviantart":
        {
            "client-id": null,
            "client-secret": null,
            "refresh-token": null,

            "auto-watch": false,
            "auto-unwatch": false,
            "comments": false,
            "extra": false,
            "flat": true,
            "folders": false,
            "group": true,
            "include": "gallery",
            "journals": "html",
            "jwt": false,
            "mature": true,
            "metadata": false,
            "original": true,
            "pagination": "api",
            "public": true,
            "quality": 100,
            "wait-min": 0,

            "filename": {
                "is_original": "{category}_{index}_{title}.{extension}",
                ""           : "LOW_RES_{category}_{index}_{title}.{extension}"
            }
        }
    }
}
Corrupt-Specturion commented 1 year ago

This works. I also had to replace is_original with is_downloadable to get it to work. Thank you for the help.

stillweebing commented 12 months ago

Short and incomplete answer for now: There is an is_original metadata field (#4559) that can be used in conditional filenames / directories to distinguish between low/full res images.

Is there any possibility for the return of the feature which enabled " downloading non-downloadable images in HQ" in DA? :')

Hrxn commented 12 months ago

@stillweebing Possible, yes. Likely? Not sure. It depends on dA, nothing that can be done on gallery-dl's side (Unless someone discovers some new, previously unknown workaround).

sbobbo commented 11 months ago

So, the above doesn't really work for me. Something can be still is_downloadable=false when it's not actually blocked from being downloaded. Like, that just seems to block the download button on the website, but doesn't interfere with Gallery-dl. is_original also suffers from the same sort of inconsistency.

What's interesting is that when you do a -K check on a specific image that's totally behind a paywall, there is a parameter called "tier_access" set to locked that you would think you could use conditionally. But when you do a -K check on an image that isn't behind a paywall, that parameter is completely omitted. This is relevant because trying to do the conditional logic with:

"directory": {
        "tier_access == 'locked'": ["{category}","HAIDeviantArt","X_da-{author[username]}","Trash"],
        ""           : ["{category}","HAIDeviantArt","X_da-{author[username]}"]
}

results in: "Applying directory format string failed (NameError: name 'tier_access' is not defined)"

a-washing-machine commented 5 months ago

@mikf

Short and incomplete answer for now: There is an is_original metadata field (#4559) that can be used in conditional filenames / directories to distinguish between low/full res images.

"filename": {
    "is_original": "filename",
    ""           : "LOW_RES filename"
}

Neat. :)

Well, okay, I haven't gotten around to really test this yet --- but quick question, would this be currently possible somehow:

If "deviantart_123456789_Some Artwork Title.png" already exists, in that case DON'T download "deviantart123456789[LOW_RES_PREFIX]_Some Artwork Title.jpg". (also considering that the LOW_RES image might be a JPG, while the old HD image might be a PNG)

If not currently possible, I imagine it would create a ton of undesired clutter needlessly downloading low-res versions of images I already have in HD, and mess up the abort-parameter. :/

It would probably "only" be a problem the first time I re-parse with the current gallery-dl version (I haven't done a reparse since October!), but cleaning up the clutter would take A LOT of manual work afterwards. :(

So it'd be great if there was some way of preventing clutter in the first place. ^_^;;

a-washing-machine commented 5 months ago

...soooo I take it that's not currently possible?

a-washing-machine commented 5 months ago

@Corrupt-Specturion

This works. I also had to replace is_original with is_downloadable to get it to work. Thank you for the help.

@sbobbo

So, the above doesn't really work for me. Something can be still is_downloadable=false when it's not actually blocked from being downloaded.

Hmm. For me, is_original works better than is_downloadable.

is_downloadable produces plenty of false-positives, some of which aren't even image-files (e.g. html submissions).

Comparing an older copy (August 2023) from of my "benchmark galleries" with ~1850 submissions against a fresh download with the LOWRES prefix enabled, I found that is_original correctly marked all and only those images where width/height had changed.

(I've got a tool to recursively compare two given folders with images for changes in image-dimensions and/or file-size between the images in said folders. Very useful to check for site changes. ;-)