mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.87k stars 976 forks source link

Help in regards to configuring downloads for Kemono #3644

Closed MarketApplier closed 1 year ago

MarketApplier commented 1 year ago

A while back, I had a setup for kemono.party, where posts would download to folders that contained their Post ID, and Post Name. Within the folders would be the attachments, with their original names intact, with nothing else added onto them. In addition, there would be a .json file that contains the text of the post, so I could get any links, should a post have them, as well as download posts that don't have attachments on them.

I lost this configuration file when I formatted my computer, and was wondering if anyone might be able to help me in finding a configuration that matches about what I describe. I greatly appreciate any and all who are able to help me out here.

a84r7a3rga76fg commented 1 year ago

I use this for saving posts with URLs. It's better to filter by keywords instead of saving any post with a URL because most URLs are unrelated to file sharing and a lot of content creators are disrespectful and paranoid because they cut URLs into pieces.

            "postprocessors": [
                {
                "name": "metadata",
                "event": "post",
                "filename": "{service}_{user}_{id}.txt",
                "filter": "embed.get('url') or re.search(r'(?i)(redgifs|atomicloli|gfycat|google|drive|onedrive|1drv|mega|xgf|k00|koofr|gigafile|mediafire|porn3dx|gofile|dropbox)', content)",
                "mode": "custom",
                "format": "{content}\n{embed[url]:?/\n/}"
                }

This is what I use for downloading files. Attachment files are saved to a folder named after the post inside a parent folder named after the user and service, separate archives for common services, filenames are limited to 10 characters, updated files will be downloaded and 429 and 430 errors are retried infinitely.

        "kemonoparty": 
        {
            "archive-format": "{service}_{user}_{id}_{num}_{hash}",
            "archive": "~/gallery-dl/archives/kemono-archive.sqlite3",
            "base-directory": "E:/gallery-dl/Artists Kemono",
            "directory": [
                "{service} kemono.party {user[:100]}",
                "{date!s:.10} {id}"
            ],
            "filename": "{num:>03} {_now!s:.16} {filename[:10]}.{extension}",
            "retries": -1,
            "retry-codes": [429, 430],
            "discord":
            {
                "#": "discord-specific settings",
                "archive-format": "{subcategory}_{server}_{channel}_{id}_{num}_{hash}",
                "archive": "~/gallery-dl/archives/kemono-discord-archive.sqlite3",
                "directory": [
                    "{subcategory} kemono.party {server}",
                    "{channel_name[:25]} {channel}",
                    "{date!s:.10} {id}"
                ],
                "filename": "{num:>03} {_now!s:.16} {filename[:10]}.{extension}"
            },
            "fanbox":
            {
                "#": "fanbox-specific settings",
                "archive-format": "{service}_{user}_{id}_{num}_{hash}",
                "archive": "~/gallery-dl/archives/kemono-fanbox-archive.sqlite3"
            },
            "fantia":
            {
                "#": "fantia-specific settings",
                "archive-format": "{service}_{user}_{id}_{num}_{hash}",
                "archive": "~/gallery-dl/archives/kemono-fantia-archive.sqlite3"
            },
            "patreon":
            {
                "#": "patreon-specific settings",
                "archive-format": "{service}_{user}_{id}_{num}_{hash}",
                "archive": "~/gallery-dl/archives/kemono-patreon-archive.sqlite3"
            },
            "gumroad":
            {
                "#": "gumroad-specific settings",
                "archive-format": "{service}_{user}_{id}_{num}_{hash}",
                "archive": "~/gallery-dl/archives/kemono-gumroad-archive.sqlite3"
            },
            "subscribestar":
            {
                "#": "subscribestar-specific settings",
                "archive-format": "{service}_{user}_{id}_{num}_{hash}",
                "archive": "~/gallery-dl/archives/kemono-subscribestar-archive.sqlite3"
            }
mikf commented 1 year ago

where posts would download to folders that contained their Post ID, and Post Name.

    "directory": ["{id} {title}"],

Within the folders would be the attachments, with their original names intact, with nothing else added onto them

    "filename": "{filename}.{extension}",

In addition, there would be a .json file that contains the text of the post, so I could get any links, should a post have them, as well as download posts that don't have attachments on them.

Use a metadata post processor for this, for example

    "postprocessors": [{
        "name": "metadata",
        "event": "post",
        "filename": "content.txt",
        "format": "{content}\n"
    }]

To have it write more then just the text content, modify or remove "format".


Putting it all together:

{
    "directory": ["{id} {title}"],
    "filename": "{filename}.{extension}",

    "postprocessors": [{
        "name": "metadata",
        "event": "post",
        "filename": "content.txt",
        "format": "{content}\n"
    }]
}
MarketApplier commented 1 year ago

Thanks for the detailed response! I didn't have a Kemono class defined before, so I created one. The Filename addition works fine, though the directory addition doesn't seem to work, as files are still just laid bare in the User ID folder, instead of being in folders with their Post ID, and their Post Name. Does the Directory addition need to be placed elsewhere, perhaps?

Here's what the addition looks like, on my end:

"kemono":
        {
            "directory": ["{id} {title}"],
            "filename": "{filename}.{extension}"
        },

I'm also not 100% sure where the Postprocessor stuff goes. I do see an entry at the top of my file, in Line 6, which mentions Postprocessors, but I'm not sure if this is a global thing, or if it will only affect Kemono, which is my desired outcome.

Thank you for your help. I really do appreciate it. Sorry for any trouble.

Hrxn commented 1 year ago

I'm also not 100% sure where the Postprocessor stuff goes. I do see an entry at the top of my file, in Line 6, which mentions Postprocessors, but I'm not sure if this is a global thing, or if it will only affect Kemono, which is my desired outcome.

Simplest way, also inside of the "kemonoparty" object:

{
    "extractor":
    {
        "kemonoparty":
        {
                "directory": ["Kemono", "{id} {title}"],
                "filename": "{filename}.{extension}",

                "postprocessors": [{
                    "name": "metadata",
                    "event": "post",
                    "filename": "content.txt",
                    "format": "{content}\n"
                }]
        }
    }
}
mikf commented 1 year ago

Small correction: kemono -> kemonoparty

You can use -E to check the expected category names and whether your custom directory and file names actually get used:

$ gallery-dl -E https://kemono.party/fanbox/user/6993449/post/506575
Category / Subcategory
  "kemonoparty" / "fanbox"

Filename format (custom):
  "{filename}.{extension}"
Filename format (default):
  "{id}_{title}_{num:>02}_{filename[:180]}.{extension}"

Directory format (custom):
  ["Kemono", "{id} {title}"]
Directory format (default):
  ["{category}", "{service}", "{user}"]

Archive format (default):
  "{service}_{user}_{id}_{num}"
MarketApplier commented 1 year ago

Thank you all for your help! This is the setup I found that does close to exactly what I want. Aside from maybe the .txt files for text (I seem to recall in my old setup, it would download text posts as JSON, which were a bit different, but I could be misremembering), the code below works exactly how I want.

        "kemonoparty":
        {
                "directory": ["{category}", "{service}", "{user}", "{id} {title}"],
                "filename": "{filename}.{extension}",

                "postprocessors": [{
                    "name": "metadata",
                    "event": "post",
                    "filename": "content.txt",
                    "format": "{content}\n"
                }]
        },
mikf commented 1 year ago

I seem to recall in my old setup, it would download text posts as JSON

Remove the format setting and it'll write out all metadata in JSON format, and you can obviously set the filename for each metadata file however you want.

skulkexpert commented 1 year ago

@a84r7a3rga76fg is "archive-format": "{service}_{user}_{id}_{num}_{hash}" really the best choice? If an artist on kemono decides to update their post and the order of the images changes (each image gets a new {num}), wouldnt the entire post be re-downloaded with a duplicate for all the existing files + whatever new file that was added?

I've currently implemented your config, it is really nice. But I'm worried that this might break my setup lol.

Is "archive-format": "{service}_{user}_{id}_{hash}" maybe enough to avoid this?

a84r7a3rga76fg commented 1 year ago

@skullexpert I believe so. I want a post's full numerical order and any changes made to an already downloaded post. if you remove {num} and download a new post that has 4 files e.g. 001.png 002.png 003.png 004.png but 001.png 003.png are the same, it'll skip 003.png. Also, you'll know a post has been updated when it's got files with the same {num}. If a post has been updated, I re-download it in its entirety and keep old unique files in a subfolder (e.g. "{date!s:.10} {id}", "old". Kemono.party keeps post revisions but it's not accessible in a user-friendly way.

Also, I recommend using jdupes on your kemono.party directory to save space by turning identical files into hardlinks.

skulkexpert commented 1 year ago

@a84r7a3rga76fg I see, thanks for the response. I hadn't considered the fact there could be multiple identical files in a single post (that arent automatically skipped by gallery-dl). Is there any reason to download these duplicates? I only really want the files that matter, so I don't think that I want to do this, personally.

I'm not very interested in preserving the order of the files in the post. If the artist adds a file in the middle of the order, then the new file will be numbered in roughly the right place (and I can always check the metadata for the correct order or understand it from context). If a file is added to the end or beginning (which is the most common case), I will still understand where it should be. Since I'm using "filename": "{id}_{num:>03}_{title}_{_now:%Y-%m-%dT%H-%M-%S}_{filename[:40]}.{extension}".

Thanks for the jdupes recommendation, I will check it out. But for my kemono setup, I shouldnt really have too many duplicates with my "archive-format": "{service}_{user}_{id}_{hash}" config, right? And I shouldn't be missing or overwriting any files, either?

a84r7a3rga76fg commented 1 year ago

@skulkexpert That archive-format will only download a post's unique attachment file once. You won't be missing or overwriting any files with those settings but keep in mind that gallery-dl can't check for duplicates in compressed files and most Japanese artists spam attachment files and compressed files that are all duplicates in one post or across multiple posts.

You can look for duplicates in compressed files with 7zip by using 7z l -slt archive.zip.