stashapp / stash

An organizer for your porn, written in Go. Documentation: https://docs.stashapp.cc
https://stashapp.cc/
GNU Affero General Public License v3.0
8.43k stars 746 forks source link

[Feature] Image Scraping #3748

Open myalow opened 1 year ago

myalow commented 1 year ago

Is your feature request related to a problem? Please describe. For a while, I've been considering using stash as a replacement for Hydrus, but the main thing standing in my way is that there's no way for me to scrape tags for images.

Describe the solution you'd like I'd like to see images on parity with scenes and performers, where I can scrape them both through a Stash-box instance, as well as local scrapers. I have ideas on how it could be handled on stash-box's side, but that's a separate issue. I do, however, think MD5 hashes will work fine in lieu of PHashes for images.

Describe alternatives you've considered there really is no alternative besides me writing an external script to semi-automate this lol

Additional context My main use would be for hentai, where most scrapers can just search md5:<hash> for whatever image on a booru. That's not as much of an option when it comes to how IRL media is catalogued & distributed, which is why I think Stash-Box integration would still be beneficial. Perhaps for images, stashbox can have a stashID for a given image, and then a list of MD5 hashes and tags tied to said stashID? Again though, for my use-case, I just want to be able to scrape images from hashes en masse. I welcome any replies that can build upon how I see this being implemented.

ghost commented 1 year ago

This would be the continuation of https://github.com/stashapp/stash/pull/2885 But i personally stopped working on it. Maybe someone one day.

As someone that uses stash mostly for images, i think you're probably better with hydrus for now. Most of the work on stash goes toward video, and it becomes frustrating really quickly to try and use it for images on mobiles for example. Image scrapping could be used to get data from boorus, or simply from a social media post, but both of those usecase arent targeted by stash (so far). I also had to create lots of custom css to make it semi decent to use as the card mode really isnt made for browsing, and the wall view do not allow to see image tags easily. This was the purpose of this pr : https://github.com/stashapp/stash/pull/2970

But all this is a matter of opinion ofc. Just know that i have been told countless time on discord that there was almost no benefit to image scrapping, if you consider stash to be only made for galleries and scenes. Image is just a gallery browser without the grouping. Finally i've made a PR (merged https://github.com/stashapp/stash/pull/2837) that allows to make custom localization. With that you can flavor your stash instance into h themed things. Like having "characters" with "races" instead of "performer" and "nationality"

Dounial commented 8 months ago

I might have misunderstood your exact use-case but if you want to scrape pictures by looking for similar ones you could maybe modify this extensions: Which browser do you use? On chrome and chromium-based browsers you can use this: Chrome (&Chromium) On Firefox you can use this: Firefox And if you really use Edge (which I doubt) you can use this: Edge

If you use it for example with another extension which opens a different global Application (my recommendation below): External Application Launcher for Chromium you could easily make a scraper out of it with some coding-skills. It would be even better because then it would be your code and you could share it and can use it with clarity since you have made it. :)

Hope this helps a bit at least, if not ask me to delete it if it's not welcome, then I will delete this comment.

toddhow commented 6 months ago

Personally I just want the necessary functions and endpoints to be made that would allow the development of 3rd party scrapers.

echo6ix commented 1 month ago

The use case for this is a lot broader than OPs. Videos (i.e. clips) can be categorized as images via the library.

A lot of bulk downloaders will often put some aspect of the URL into the filename schema. Unfortunately, Scene Filename Parser cannot be used for images (nor can it populate the URL field), but a third-party script could populate the URL fields from the filename schema. The laborious part would be manually clicking the scrape button for each image instead of also having something like Tagger or Identify that can chunk it out in bulk based on the URL field.