stashapp / stash

An organizer for your porn, written in Go. Documentation: https://docs.stashapp.cc
https://stashapp.cc/
GNU Affero General Public License v3.0
9.38k stars 804 forks source link

[RFC] Adding Hashes to Galleries for Potential Stash-Box Integration #3496

Open SpedNSFW opened 1 year ago

SpedNSFW commented 1 year ago

Scope

The idea here would be to add hashes to galleries so metadata could be matched with an instance of stash-box in the same way scenes currently are.

Long Form

Something I've been thinking about for a few days is the potential expansion of stash-box to include galleries. After scenes, galleries are the most widely produced content, with some studios almost exclusively producing them. Because of this, I believe this to be the next logical step.

In order to be able to link content from stash to stash-box, we're likely going to need hashes for galleries. Since galleries can be created from an archive or just a directory, this is potentially a problem, however, I had an idea that may solve this (can't promise it's any good).

Hashing an archive should be simple, but for directories, you run into issues where file naming structures could be a problem. I was thinking we could hash each image, sorting all the hashes alphabetically and treating them all as a string, then hashing that entire string. In theory that should yield a consistent hash regardless of file names.

Within the tagger view for galleries, matches could be displayed almost identically to how scenes are displayed. Instead of duration matches, however, you could use image counts.

Before any of this would go ahead though, galleries need a bit of an overhaul. A manual way of setting a cover image for example, or having edits from a gallery apply to its images (or vice-versa).

Some errant thoughts

Because galleries could easily consist of over a thousand images, I believe adding individual images to stash-box might not be the best idea. Manually adding, editing, and approving hundreds of thousands of images just sounds bad. Galleries should be just as simple as scenes.

puc9 commented 1 year ago

I also want to have the same integration for galleries as for scenes. However, I think that a simple true hash is not good enough unless the galleries are entirely identical.

The great advantage of stash for scenes compared to other such apps is having the perceptual hash which helps you find scenes encoded differently but still the same scene.

IMHO we would also need something similar for a full gallery. A kind of a perceptual hash for the full gallery. Since building a perceptual hash for too many images is losing too much of the original information I think that there needs to be a max number of images used to build a PHASH. For extra large galleries I can think of two ways to still be able to create a PHASH.:

Individual images are indeed a tougher call. My opinion is that once galleries are implemented it would become clearer if individual images are needed or having just the galleries is enough.

Great idea to start this!