mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
10.86k stars 889 forks source link

Extract booru notes and put them in text file #5556

Open ocrhell opened 2 months ago

ocrhell commented 2 months ago

Is there a way to extract notes (translations) and put them in a similarly named downloaded text file? Specifically gelbooru. When running an instance with this in config file:

        "booru":
        {
            "tags": false,
            "notes": true
        }

Only the image is downloaded and the notes aren't extracted at all. Not even in cmd. Should I be adding anything in gelbooru's block?

Thanks.

Hrxn commented 2 months ago

Have you tried it with a "metadata" post-processor?

https://gdl-org.github.io/docs/configuration.html#postprocessor-configuration https://gdl-org.github.io/docs/configuration.html#postprocessor-options

For example

{
    "extractor":
    {
        "booru":
        {
            "..": "..",

            "postprocessors":[

                {
                    "name" : "metadata",
                    "event": "post",
                    "mode" : "custom",
                    "skip": true,
                    "content-format": "{content|description}\n",
                    "filename": "{id}.txt"
                }
            ]

        }
    }
}

Of course, you need to check the output with -K, if it's actually {content} you want, or {description}, or whatever the name is for translation - given that the site provides something like such translations.

ocrhell commented 2 months ago

Closed it prematurely, sorry. Going through the notes block from gelbooru.py and gelbooru_v02.py, is it possible to filter out height, width, x, y?

notes.append({
                "width" : int(extr(note, 'data-width="', '"')[0]),
                "height": int(extr(note, 'data-height="', '"')[0]),
                "x"     : int(extr(note, 'data-x="', '"')[0]),
                "y"     : int(extr(note, 'data-y="', '"')[0]),
                "body"  : extr(note, 'data-body="', '"')[0],
            })

I've tried # but that doesn't work. Tried multiple variations of notes.x / notes.width etc... with an additional postprocessors instance with delete before and after. Also didn't work. -K gives notes[N]['width'] / notes[N]['height'] etc... and I've tried those too.