mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.68k stars 952 forks source link

ugoira postprocessor keep-files false and leftover metadata jsons #6270

Open thatfuckingbird opened 2 weeks ago

thatfuckingbird commented 2 weeks ago

I have the following postprocessor config:

        "postprocessors": [
            {
                "name": "ugoira",
                "whitelist": ["pixiv", "danbooru"],
                "keep-files": true,
                "mode": "mkvmerge"
            },
            {
                "name": "ugoira",
                "whitelist": ["pixiv", "danbooru"],
                "keep-files": false,
                "mode": "archive",
                "metadata": true
            }
        ],

and pixiv.metadata enabled, pixiv.ugoira set to original.

This seems to work well, a video and zip is produced as expected and the original frame images are removed. However, the metadata .json files for the frame images are not removed, so now I have a lot of leftover jsons with essentially the same content, and with filenames not matching either the archive or the video (the files are named like XXXX_pYYYY.jpg.json, and the archive like XXXX_p0.zip so the extensions do not match). If pixiv.ugora is not set to original, then I get a single XXXX_p0.zip.json to go with the archive, and I want to achieve similar behavior even when using original. Basically something like taking the first frame's json (XXXX_p0.jpg.json), renaming it to XXXX_p0.zip.json and deleting the other frame jsons (yes, technically that json belongs to the first frame not the zip or the video, but the only difference is the filename/url field in the json which still points to the first frame image - but I think this compromise is acceptable and nothing better comes to mind). What do you think?

I do know I can achieve this with a custom postprocessor, this issue is more about what should gallery-dl do with the metadata json in this scenario.

mikf commented 2 weeks ago

There needs to be something like a --write-metadata-post option that writes only one metadata file per post regardless of how many files it contains. This would also help with downloading text-only posts.

thatfuckingbird commented 2 weeks ago

In general I do want a metadata file for each file (if for nothing else, because it's easier for automated tools to match the metadata with the image then), i.e. if I was keeping the original frames, then the current behavior of having a metadata file for each frame is fine. This case is only special since the zip getting made with the original frames makes it so there is not much point in keeping them as individual files.

If there was a way to apply this proposed --write-metadata-post option specifically only to ugoira then that would solve the problem.

mikf commented 2 weeks ago

Right, you need this for hydownloader to support "original" ugoira frames, so it needs to cover this special case for Pixiv only and do the regular --write-metadata functionality everywhere else.

I don't think this could ever be done without several custom post processors. Maybe I could add a special command-line flag for it or another --ugoira target like copy+archive+metadata? How configurable would this need to be?

thatfuckingbird commented 2 weeks ago

I don't think this could ever be done without several custom post processors. Maybe I could add a special command-line flag for it or another --ugoira target like copy+archive+metadata? How configurable would this need to be?

Well, if you can add a ugoira mode that would be "make an archive, make a video as close to lossless as possible, and leave 1 metadata file (or 1+1 for the video and the archive)" that would achieve what I want to be the default in hydl and what I was trying for with the postprocessors above, don't really need additional configuration (well, maybe whether to use mkvmerge or just ffmpeg, but I think I might just make mkvmerge required by default). What's important for me is that this is configurable in the config file not jsut as a CLI flag, since I want to leave open the option for people to change their ugoira config if they want.

However the current behavior isn't really that bad either, I already have a workaround in place for ugoira to use the same json metadata file for both archive and video, since that's already what happens when "original" is off (there's a zip and a webm but only 1 json). So overall the only "problem" from hydl POV is just the leftover json files, but that's only an annoyance/clutter and only in "original" mode and doesn't affect the actual operation of the program (and I could add some cleanup routine or something).

Hrxn commented 2 weeks ago

I think mkvmerge is already the default (if found), isn't it?