mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

[fantia] Question about fantia scraping and extractor configuration #4064

Closed etui-null closed 1 year ago

etui-null commented 1 year ago

I was looking to start archiving a few fantia accounts I'm subscribed to but couldn't figure out a way to emulate the folder structure I've been manually using as well as how to scrape all the text I was interested in. On fantia, creators can pay wall specific sections of a post to different tiers. I like to keep each of these sections contained in their own folder; however, when I checked the keywords available for a link I couldn't find anything that would allow me to break up each section.

I tried to find a good example of this from some random post on the front page and this should illustrate what I'm talking about. Post: https://fantia.jp/posts/1964477, word of warning, this link is NSFW. It's downright impossible to find SFW links on fantia.

> gallery-dl -K "https://fantia.jp/posts/1964477"
Keywords for directory names:
-----------------------------
category
  fantia
comment
  🐮見放題プランの方向けに3年前の投稿を公開していきます🐮
いつも支援して頂きありがとうございます!
For Unlimited plan subscribers, posts from three years ago are now available! Thank you for your support!

⇩🐧下スクロールでエロ差分🐧⇩

date
  2023-05-14 09:00:00
fanclub_id
  2931
fanclub_name
  🐧軒下の猫屋🐧
fanclub_url
  https://fantia.jp/fanclubs/2931
fanclub_user_id
  86032
fanclub_user_name
  アルデヒド
post_id
  1964477
post_title
  【見放題プラン】グラブルなまあし部 エウロペさん
post_url
  https://fantia.jp/posts/1964477
posted_at
  Sun, 14 May 2023 18:00:00 +0900
rating
  adult
subcategory
  post
tags[N]['name']
  グラブル
tags[N]['uri']
  /fanclubs/2931/posts?tag=%E3%82%B0%E3%83%A9%E3%83%96%E3%83%AB

Keywords for filenames and --filter:
------------------------------------
category
  fantia
comment
  🐮見放題プランの方向けに3年前の投稿を公開していきます🐮
いつも支援して頂きありがとうございます!
For Unlimited plan subscribers, posts from three years ago are now available! Thank you for your support!

⇩🐧下スクロールでエロ差分🐧⇩

content_category
  thumb
content_filename

date
  2023-05-14 09:00:00
extension
  png
fanclub_id
  2931
fanclub_name
  🐧軒下の猫屋🐧
fanclub_url
  https://fantia.jp/fanclubs/2931
fanclub_user_id
  86032
fanclub_user_name
  アルデヒド
file_id
  thumb
file_url
  https://c.fantia.jp/uploads/post/file/1964477/48a60e5f-1487-4063-b952-1e0e87b49c3d.png
filename
  48a60e5f-1487-4063-b952-1e0e87b49c3d
num
  1
post_id
  1964477
post_title
  【見放題プラン】グラブルなまあし部 エウロペさん
post_url
  https://fantia.jp/posts/1964477
posted_at
  Sun, 14 May 2023 18:00:00 +0900
rating
  adult
subcategory
  post
tags[N]['name']
  グラブル
tags[N]['uri']
  /fanclubs/2931/posts?tag=%E3%82%B0%E3%83%A9%E3%83%96%E3%83%AB

And here's an image of what the post sections look like. image

Each section can have a title as well as additional comment information, should any be written. If all of this has already been implemented, could someone help me build the extractor config and postprocessor details needed to scrape the contents of the tiers into dedicated folders as well as write a text file to that folder with any relevant info?

etui-null commented 1 year ago

Realized this should have gone in discussions. Apologies.