Closed cinnamon-bun closed 3 years ago
(For context: I'm making an indexer and gallery viewer for gallery-dl archives, so I want to preserve as much info as possible about authorship, comics sequences, etc.)
Commit https://github.com/mikf/gallery-dl/commit/89a2bcbb2dcbe7cd4efc6066b2c64b1793270300 adds a descriptions
option for furaffinity, which allows you to disable any description text processing like remove_html
, although there might be differences between the old and new fa layout:
$ gallery-dl -j -o descriptions=text https://www.furaffinity.net/view/35225276
...
"description": "Ipad Commission for with adorable spooky puppy! ywy \r\n\r\n✨ My twitter ✨",
$ gallery-dl -j -o descriptions=html https://www.furaffinity.net/view/35225276
...
"description": "</td>\n </tr>\n <tr>\n <td valign=\"top\" align=\"left\" width=\"70%\" class=\"alt1\" style=\"padding:8px\">\n Ipad Commission for <a href=\"/user/spookielee\" class=\"iconusername\"><img src=\"//a.facdn.net/20210120/spookielee.gif\" align=\"middle\" title=\"SpookieLee\" alt=\"SpookieLee\" /></a> with adorable spooky puppy! <i class=\"smilie love\"></i> ywy <i class=\"smilie love\"></i><br />\r\n<br />\r\n✨ <a class=\"auto_link named_url\" href=\"https://twitter.com/UlitochkaArt\">My twitter</a> ✨\n </td>\n </tr>",
Thank you!
The description sometimes contains important links like:
The description's HTML is removed and that info is lost:
https://github.com/mikf/gallery-dl/blob/master/gallery_dl/extractor/furaffinity.py#L83-L113
Would it be possible to keep the HTML, or at least
<a>
tags? (What's the general policy of gallery-dl about HTML?)Example pages to test on (NSFW)
"Ipad Commission for with adorable spooky puppy!"
without the account name