mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.51k stars 943 forks source link

[Kemono] How can I skip words in the attachment links? #3216

Open a84r7a3rga76fg opened 1 year ago

a84r7a3rga76fg commented 1 year ago

Couldn't find any SFW examples, is it possible to skip "xgf" and other keywords inside attachment links? By attachment link, I mean the downloadable links under "Files".

            "postprocessors": [
                {
                    "name": "metadata",
                    "event": "post",
                    "filename": "{service}=x=user=x={user[:100]}=x=post=x={id}.txt",
                    "filter": "embed.get('url') or re.search(r'(?i)(xgf)', content)",
                    "mode": "custom",
                    "format": "{content}\n{embed[url]:?/\n/}"
                }
            ]
mikf commented 1 year ago

The last part of a download URL that contains an actual name can be accessed with filename.

Do you want to skip the download when it contains "xdf" or just the post processor?

afterdelight commented 1 year ago

Yes, i'm also interested in this . i want to skip files with psd keyword in it. how?

mikf commented 1 year ago

You mean psd filename extension, right? --filter "extension != 'psd'"

And because that's going to be your next question, this is how check for multiple extensions at once --filter "extension not in ('psd', 'pdf', 'rar')"

afterdelight commented 1 year ago

No, what i meant is to skip files in downloads section which contain 'psd' keyword such as

Downloads

    Download 2019_PSD_4.zip
    Download 2019_JPG_4.zip
a84r7a3rga76fg commented 1 year ago

Do you want to skip the download when it contains "xdf" or just the post processor?

No, I want it to ignore "xdf" in any attachment link that starts with "https://.downloads.fanbox.cc/images/post/".

Example <a href="https://downloads.fanbox.cc/images/post/2152301/uRyM1G82XdfeDiYhKtVbOyjo.png">

afterdelight commented 1 year ago

why did you want to skip files with random strings?

mikf commented 1 year ago

@a84r7a3rga76fg you can use another regex to search for such links and then negate the result with not: not re.search(r'(?i)(xgf)', content)

@afterdelight Then check filename in image-filter "image-filter": "not re.search(r'(?i)(psd)', filename)"

afterdelight commented 1 year ago

what if i want to filter both word 'psd' and 'korean' is this correct? "image-filter": "not re.search(r'(?i)(psd|korean)', filename)"

mikf commented 1 year ago

It is.

https://www.regular-expressions.info/tutorial.html https://regex101.com/

afterdelight commented 1 year ago

will image attachments on content section affected by image-filter?

a84r7a3rga76fg commented 1 year ago

@ afterdelight I'm saving certain posts.

@mikf I couldn't get it to work.

afterdelight commented 1 year ago

@ afterdelight I'm saving certain posts.

what do you mean by that?

Luke-L commented 1 year ago

is there a way to make it skip certain words in the kemono/coomer posts? like if post contains (#ad|trial|free for|@) then skip downloading...?

something like --filter "title not in ('#ad', 'www', 'trial', 'free for', (\d{9}))"? and where do i put this flag? at the bottom of the kemono extractor, or in the command itself?

mikf commented 1 year ago

"title not in ('#ad', 'www', 'trial', 'free for', (\d{9}))"

Using a regular expression is better here, I think. Try something like "not re.search(r'(?i)#ad|www|trial|free for', title)"

and where do i put this flag?

The command-line option (--filter) goes into the command itself.

In your config file, you'd use image-filter: "image-filter": "not re.search(r'(?i)#ad|www|trial|free for', title)"