philomena-dev / philomena

Next-generation imageboard
GNU Affero General Public License v3.0
84 stars 29 forks source link

InkBunny Scraper #228

Closed chaskayote closed 2 months ago

chaskayote commented 2 months ago

Hear me out! While InkBunny is one of the only furry sites that allows cub, and is therefore known as a cub site now, it has many fabulous artists on it that don't do cub, such as Dripponi, TheSecretCave, Chunie, Tsaiwolf, and more! Unlike FA, SoFurry, etc, InkBunny has an amazing API and should be super easy to use as a scraper.

To get a SID, make an InkBunny account, go to settings in inkbunny and enable API access. CleanShot 2024-04-17-000138

Then put in https://inkbunny.net/api_login.php?username=derpibooru&password=hunter2 and it will output the SID in JSON. This way, you could have people login in their user settings, or my suggestion, put one in the config.

You can also login as guest, but then you can only view general rating posts with https://inkbunny.net/api_login.php?username=guest which will output an SID you can use so you can view a post by converting https://inkbunny.net/s/2436862 -> https://inkbunny.net/api_submissions.php?show_description=yes&sid=4X88ktV7jxywp65Ng40ez1qTJd&submission_ids=2436862 which gives you a beautiful json to mine for info. You'd probably only pull file_url_full for file, username for artist namespace, and description for description. Obviously tags are free writing and likely will just muddy the database.

Is your feature request related to a problem? Please describe. You can only use direct image links and it struggles when it's behind an auth wall (anything Mature+).

Describe the solution you'd like You can either code in the config an API key related to a username or allow users to put in their own InkBunny SID.

Describe alternatives you've considered Put in the direct image link.

chaskayote commented 2 months ago

I tried putting together a scraper ex file but haven't tested it yet because I know it's wrong. I can't figure out how to pull the submission ID from the URL to plug it into the API.

defmodule Philomena.Scrapers.Inkbunny do
  @url_regex ~r|\Ahttps?://inkbunny.net/s/([\d]+)/?|

  @spec can_handle?(URI.t(), String.t()) :: true | false
  def can_handle?(_uri, url) do
    String.match?(url, @url_regex)
  end

  def scrape(_uri, url) do
    [submission_id] = Regex.run(@url_regex, url, capture: :last)

    api_url = "https://inkbunny.net/api_submissions.php?show_description=yes&sid=#{inkbunny_sid()}&submission_ids=#{submission_id}"
    {:ok, %Tesla.Env{status: 200, body: body}} = Philomena.Http.get(api_url)

    json = Jason.decode!(body)
    submission = json["submissions"]

    images = submission["files"]["file_url_full"]

    %{
      source_url: submission["url"],
      author_name: submission["username"],
      description: submission["description"],
      images: images
    }
  end
end
liamwhite commented 2 months ago

From a glance it seems like it should work, is there a specific issue you are having?

chaskayote commented 2 months ago

I got it working! I had to also update runtime obviously and the scrapers file. I also got it working for FurAffinity and e621 too. NSFW works without issue.

Im working on Pixiv but it's a little tricky.

How can I submit my code to this repository? Any policies on that?

liamwhite commented 2 months ago

To contribute, use the Github UI to fork the repository under your own account, add the relevant code to a new branch, and use the Github UI to submit a pull request