mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
10.68k stars 881 forks source link

Saving kemono comments with links #5694

Closed Hawker2 closed 2 weeks ago

Hawker2 commented 3 weeks ago

The topic of saving Kemono comments has been well-trod, but there appears to be a minor gap. While the post-processor can filter to only save if there are links in content, and can save comments, it is not able to filter based on links in comments. A use-case is a post that only has links in comments and nothing in the content (e.g. NSFW https://kemono.su/patreon/user/10420419/post/90155856). Using "-K" verifies that comments are not a variable that's available.

For instance, the following works for content:

            "postprocessors": [
                {
                    "name": "metadata",
                    "event": "post",
                    "skip": true,
                    "filename": "{id} {title[:60]}.txt",

                    "#": "write text content and external URLs",
                    "mode": "custom",
                    "format": "{content}\n{embed[url]:?/\n/}",

                    "#": "only write file if there is an external link present",
                    "filter": "embed.get('url') or re.search(r'(?i)(gigafile|xgf|1drv|mediafire|mega|google|drive|anonfiles)', content)"
                }
            ]

But this extension to use comments does not:

            "postprocessors": [
                {
                    "name": "metadata",
                    "event": "post",
                    "skip": true,
                    "filename": "{id} {title[:60]}.txt",

                    "#": "write text content and external URLs",
                    "mode": "custom",
                    "format": "{content}\n{embed[url]:?/\n/}{comments}",

                    "#": "only write file if there is an external link present",
                    "filter": "embed.get('url') or re.search(r'(?i)(gigafile|xgf|1drv|mediafire|mega|google|drive|anonfiles)', content) or re.search(r'(?i)(gigafile|xgf|1drv|mediafire|mega|google|drive|anonfiles)', comments)"
                }
            ]

This fails as re.search does not have access to comments, and can be verified with -K. The only thing I can see is to remove the filter completely, which works but then has the potential to create a LOT of excess files.

mikf commented 3 weeks ago

comments is a list of comment objects and therefore can't be passed as-is as argument to re.search. The simplest way to make this work would be to convert the whole thing into one large string and search through that:

re.search(r"...", str(comments))