mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
10.76k stars 886 forks source link

[Site Request] Kemono.party #1216

Closed ghost closed 3 years ago

ghost commented 3 years ago

The site is a patreon/DL-Site/Fantia/etc scrapper. It's a yiff.party alternative now that yiff.party is gone. It's kind of a bother to download 300+ images by hand so it would be nice if gallery-dl had this site.

Hrxn commented 3 years ago

New site, already down? 😄

kattjevfel commented 3 years ago

New site, already down? smile

It seems they don't do http, not even a redirect and browsers don't auto-try https. https://kemono.party/

Hrxn commented 3 years ago

I did use HTTPS, and I see something, but this is extremely flaky, at least for me. I get 404s and timeouts all the time, some other errors as well.

kattjevfel commented 3 years ago

Ah, yeah I get timeouts too, and here I got my hopes up for getting free high quality anime tiddies!

kattjevfel commented 3 years ago

The site now appears to be working reliably, and while it does list a RSS option, it just 404's, so we're gonna have to go with more caveman approaches.

Example user: https://kemono.party/patreon/user/233822 (NSFW)

each entry looks like:

  <a href="/patreon/user/233822/post/39735940" class="thumb-link">

    <div class="thumb thumb-with-image thumb-standard">
      <img src="/thumbnail/files/233822/39735940/Adventure_Girls_S5_EP21.png">
      <div class="thumb-with-image-overlay">
        <h3>Adventure Girls - The Suitor</h3>

          <small>2020-07-26 16:57:56</small><br>

          <small>1 attachments</small>

      </div>
    </div>

</a>

some metadata available inside :

      <meta name="service" content="patreon"/>
      <meta name="count" content="1063"/>

posts look like this:


  <div class="page" id="page">

      <h1>Adventure Girls - The Suitor</h1>

      <p><p></p><p>[Based on Season 5, Episode 21]</p><p>Princess Bubblegum makes a sex robot for local simp Braco, decides to keep it for herself.</p><h3><u><strong>Last post here on Patreon. Moving to Subscribe Star.</strong></u></h3></p>

          <a class="fileThumb" href="/files/233822/39735940/Adventure_Girls_S5_EP21.png">
            <img
              data-src="/thumbnail/files/233822/39735940/Adventure_Girls_S5_EP21.png"
              src="/thumbnail/files/233822/39735940/Adventure_Girls_S5_EP21.png"
            >
          </a>
          <br>

          <a class="fileThumb" href="/attachments/233822/39735940/Adventure-Girls-S5-EP21.png">
            <img
              data-src="/thumbnail/attachments/233822/39735940/Adventure-Girls-S5-EP21.png"
              src="/thumbnail/attachments/233822/39735940/Adventure-Girls-S5-EP21.png"
            >
          </a>
          <br>

  </div>

post with only a file attached (and header):

      <h1>June 2020 Art</h1>

        <a href="/attachments/233822/38850658/Jun20.rar" target="_blank">
          Download Jun20.rar
        </a>
        <br>

      <p><p>Did a lot this month.</p></p>

          <a class="fileThumb" href="/files/233822/38850658/Pat_Pack.png">
            <img
              data-src="/thumbnail/files/233822/38850658/Pat_Pack.png"
              src="/thumbnail/files/233822/38850658/Pat_Pack.png"
            >
          </a>
          <br>

some metadata here too:

      <meta name="service" content="patreon"/>
        <meta name="published" content="2020-07-26 16:57:56"/>
      <meta name="added" content="2020-09-16 11:13:47.903080"/>
      <meta name="id" content="39735940"/>

Worth noting, as seen above when a post only has one image, it's seemingly linked twice. Checking closer with another account that posts multiple attachments, the href="/files/ is the "header" image, and the other is actual attachment it seems.

Hope this is of any help!

mikf commented 3 years ago

Added some initial support with https://github.com/mikf/gallery-dl/commit/e07dfc4fe5a92e5e02617d41448452e27c4a7f96. It kind of works, but duplicate files will be a problem. There are posts with 4 listed files, all identical and all with a different filename. And from what I can tell, there is no other way to decide whether two files might be the same before downloading and comparing them.

kattjevfel commented 3 years ago

I'll take duplicate files over files with conflicting names :P

ghost commented 3 years ago

Hello, I maintain Kemono. Just wanted to add that the software has APIs for both users and posts at /api/<service>/user/<id> and /api/<service>/user/<id>/post/<id> respectively. Think that'd be easier to scrape from instead of the HTML, which may change in the future. Let me know if there's any way I can help 👍

mikf commented 3 years ago

@kemono-bugs Thank you very much for mentioning the API endpoints, they are very helpful. Where should I have been looking for them myself? Is there some sort of documentation anywhere I missed?

And another question, if you don't mind: What is with "duplicated" posts like https://kemono.party/patreon/user/94956/post/2337848? Even the API response contains two entries, both for post ID 2337848. Did something just go wrong when importing that post?

ghost commented 3 years ago

@mikf

Where should I have been looking for them myself? Is there some sort of documentation anywhere I missed?

Unfortunately no, I haven't had the time to write documentation proper. All API routes can easily be found in the source code though.

And another question, if you don't mind: What is with "duplicated" posts like https://kemono.party/patreon/user/94956/post/2337848?

The site was originally designed to allow detected revisions of a post to be stored under the same ID as their parent. However, this also let unintended duplicates slip in, mostly due to user error (namely, clicking submit multiple times) I'm currently working on an update that will clean up the unintentional dupes and limit the amount of posts for a single ID to one.

ghost commented 3 years ago

Unsure if this is out of scope for gallery-dl, but is it possible to add options for it to download text posts, and just text that accompanies posts? An idea I had would be, for each post for a user, would be a folder for said post. The folder containing images, text, and anything else the post might have had.