mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.77k stars 964 forks source link

Postprocessor setting discrepencies #4830

Open biggestsonicfan opened 11 months ago

biggestsonicfan commented 11 months ago

I currently use this setup for grabbing all metadata into json files for parsing:

"json_metadata":
{
    "name": "metadata",
    "event": "XXX",
    "mode": "json",
    "directory": "json",
    "extension-format": "json"
},

If I use "event": "prepare",, I can not download text-only posts from Patreon. If I use "event": "post", filenames for Twitter video files are parsed as None and the extension type is not displayed when a duplicate file is detected in the command line.

Do I need a separate prepare and post json metadata grabber or are these just bugs?

mikf commented 11 months ago

"event": "post" triggers once for each "post" / Tweet / container-like thing that can contain files. This happens before any files got processed, so you don't get any filenames or paths, and it also doesn't matter how many files this post contains.

"event": "prepare" (and file, after, skip) triggers for each file download. It therefore does not trigger when there are no files, but there is filename metadata available when it does.

Do I need a separate prepare and post json metadata grabber

Depending on what exactly you want to achieve, you might need both.

biggestsonicfan commented 11 months ago

Okay, after reading your response, downloading a tweet twice with both prepare and post, comparing the json files, then rereading your response, I finally get it.

What I do want is a json metadata file per downloaded file from Patreon but I also getting the text posts. This could get tricky...

biggestsonicfan commented 11 months ago

Can we add a test to test_postprocessor.py to spit out the unique (or otherwise not null) entries per prepare and post processes for a given/each extractor?

mikf commented 11 months ago

That can already be done with -K or -j. post has directory metadata, prepare has file metadata.

For Patreon in particular, the difference between the two is hash, type, num, filename, and extension when going by the code.

biggestsonicfan commented 11 months ago

How much of an undertaking would it be to create a consolidated metadata category, where it's populated by the premetadata, processes normally extracting metadata as it goes along (if any downloads occur), then replaces any None values with post metadata and adds new keys?