mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.38k stars 929 forks source link

I would like to know the option to download the text in post for fantia. #4126

Closed suzumiyabi closed 11 months ago

mikf commented 1 year ago

There is no easy option to do that. You'll have to use a metadata post processor in your config file and select the correct metadata fields to write (or you just let it write everything).

Something like this:

            "postprocessors": [
               {
                    "name" : "metadata",
                    "event": "post",
                    "filename": "{post_id}.txt",
                    "format": "{comment}"
                }
            ]
suzumiyabi commented 1 year ago

thanks!

Is fanbox the same metadata?

mikf commented 1 year ago

For Fanbox, the metadata field names containing text are different depending on post type,

See https://github.com/mikf/gallery-dl/issues/3784#issuecomment-1473658169.

suzumiyabi commented 1 year ago

I wrote fanbox and fantia in gallery-dl.conf But it doesn't work and I need you to tell me what's wrong.

{
    "extractor": {
        "fanbox": {
            "postprocessors": [{
                "name": "metadata",
                "event": "post",
                "filename": "{id}.txt",
                "#": "write text content",
                "format": [
                    "{title:?//}",
                    "{content:?//}",
                    "{html:?//}",
                    "{text:?//}",
                    "{excerpt:?//}"
                ]
            }]
        }
        ,"fantia": {
            "postprocessors": [{
                "name": "metadata",
                "event": "post",
                "filename": "{post_id}.txt",
                "format": [
                    "{title:?//}",
                    "{comment:?//}",
                ]
            }]
        }
    }
}
mikf commented 1 year ago

Remove the comma at the end of "{comment:?//}", in fantia.

On fantia, there's also content_comment or blogpost_text for each content section of a post, so you might want to add {blogpost_text|content_comment} to format and {content_id} to filename.

biggestsonicfan commented 1 year ago

so you might want to add {blogpost_text|content_comment} to format

@mikf This only ever gives me None, has a recent change deprecated this?

        "fantia-metadata":
        {
            "name": "metadata",
            "event": "post",
            "filename": "{post_id}-{content_id}.html",
            "format": [
                    "{title:?//}",
                    "{comment:?//}",
                    "{blogpost_text|content_comment}"
            ],
            "directory": "metadata"
        }
mikf commented 1 year ago

I get "This is an image gallery." and "This is a test." for this test post. There have been some changes (dc7af000), but all metadata names remained the same.

biggestsonicfan commented 1 year ago

I get "This is an image gallery." and "This is a test." for this test post. There have been some changes (dc7af00), but all metadata names remained the same.

tl;dr:

Original post below:

I've changed my format:

"format": "<!DOCTYPE html>\n<html>\n<head>\n<title>{title}</title>\n</head>\n<body>\n<!--Date: {date}-->\n<p>Comment:</p>\n{comment}\n<p>\nBlogpost\\Content:</p>\n<p>{blogpost_text|content_comment}</p><p>Embeds:</p>\n<p>{embed[url]:?/\n/}\n</body>\n</html>",

For id thumb I get None for the title and None for the content. Title should be "Test Fantia Post"

For id 1870739 , I get None when I should get "This is an image gallery." with title as None when it should be "Test Image Gallery".

For id 1870740 I get None for the title when it should be "Test Blog Content 1", for the content I get "This is an image gallery.", which should have been the content for 1870739. There are two embeds in this post and neither are there.

For id 1870741 I get the content of 1870740, but embeds aren't captured as such:

<p>This is a test.

This is a test.

</p><p>Embeds:</p>
<p>

The title is also still None when it should be "Test Blog Content 2", however I'm now wondering if that should be a subtitle and the main post should be the title of "Test Fantia Post", unless that title should only be applied to the thumbnail.

For id 1870870, I get identical output as 1870741. Title should be "Test Blog Content 3 (Links & HTML Importing?)" and content should be, well, the links and html. There's no embeds here either.

EDIT: title should be content_title. Investigating things further, however.

I see a parent_post with a title in it that I think might be useful to pass as a parent_title, seeing as that doesn't seem to be a used variable right now.

EDIT2: "event": "post", is incorrect. Changing to "event": "prepare", fixed most of my issues.

biggestsonicfan commented 1 year ago

I've ended up with this:

        "fantia-metadata2":
        {
            "name": "metadata",
            "event": "prepare",
            "mode": "json",
            "directory": "json",
            "extension-format": "json"
        },

I can literally parse out anything I need from any post, even if it's deleted, with this. Embeds don't seem to work though, but I think this is "good enough" for what I need. I wish I realized I could do this with all services a lot earlier, like with people I pledged on patreon who no longer have patreons.

mikf commented 1 year ago

@biggestsonicfan https://github.com/mikf/gallery-dl/commit/c79359eb3a5e04f2830cb9929e5d67119bf89482 should fix some of the issues you mentioned.