mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

Preserve links or HTML in Furaffinity descriptions? #1231

Closed cinnamon-bun closed 3 years ago

cinnamon-bun commented 3 years ago

The description sometimes contains important links like:

The description's HTML is removed and that info is lost:

https://github.com/mikf/gallery-dl/blob/master/gallery_dl/extractor/furaffinity.py#L83-L113

            # new site layout
            data["description"] = text.unescape(rh(extr(
                'class="section-body">', '</div>'), "", ""))

            # new site layout
            data["description"] = text.unescape(text.remove_html(extr(
                "</table>", "</table>"), "", ""))

Would it be possible to keep the HTML, or at least <a> tags? (What's the general policy of gallery-dl about HTML?)


Example pages to test on (NSFW)

cinnamon-bun commented 3 years ago

(For context: I'm making an indexer and gallery viewer for gallery-dl archives, so I want to preserve as much info as possible about authorship, comics sequences, etc.)

mikf commented 3 years ago

Commit https://github.com/mikf/gallery-dl/commit/89a2bcbb2dcbe7cd4efc6066b2c64b1793270300 adds a descriptions option for furaffinity, which allows you to disable any description text processing like remove_html, although there might be differences between the old and new fa layout:

$ gallery-dl -j -o descriptions=text https://www.furaffinity.net/view/35225276
...
      "description": "Ipad Commission for  with adorable spooky puppy!  ywy \r\n\r\n✨ My twitter ✨",

$ gallery-dl -j -o descriptions=html https://www.furaffinity.net/view/35225276
...
      "description": "</td>\n                </tr>\n                <tr>\n                    <td valign=\"top\" align=\"left\" width=\"70%\" class=\"alt1\" style=\"padding:8px\">\n                        Ipad Commission for <a href=\"/user/spookielee\" class=\"iconusername\"><img src=\"//a.facdn.net/20210120/spookielee.gif\" align=\"middle\" title=\"SpookieLee\" alt=\"SpookieLee\" /></a> with adorable spooky puppy! <i class=\"smilie love\"></i> ywy <i class=\"smilie love\"></i><br />\r\n<br />\r\n✨ <a class=\"auto_link named_url\" href=\"https://twitter.com/UlitochkaArt\">My twitter</a> ✨\n                                            </td>\n                </tr>",
cinnamon-bun commented 3 years ago

Thank you!