stashapp / stash

An organizer for your porn, written in Go. Documentation: https://docs.stashapp.cc
https://stashapp.cc/
GNU Affero General Public License v3.0
9.32k stars 801 forks source link

[Feature] AHash fingerprint #4083

Open stg-annon opened 1 year ago

stg-annon commented 1 year ago

Is your feature request related to a problem? Please describe.

Using a scenes audio to identify the file has been discussed numerous times across discord and a few issues, I could not find a dedicated issue to track any discussion on the topic so this issue is for that

Describe the solution you'd like

Add an audio hash fingerprint type to scenes (AHash)

The existing Chromaprint library can be used for this, which is the same utility used in MusicBrainz the inspiration of stash-box or for our purposes go-fingerprint may be more suitable

This is useful in the cases where a scene has the same video but different audio so the PHash will have a strong correlation but the scenes are actually different, an example I am aware of is a Spanish dub's of popular MindGeek scenes, having the audio hash would allow users to identify these discrepancies

Describe alternatives you've considered

3rd Party plugin to generate these hashes and store them in a detached database this is doable but not ideal and prevents the use of these fingerprints across stash-boxes and by most of the Stash userbase

Additional context

I have briefly experimented with the Chromaprint library in Python and it appears to work well for this purpose, there would likely need to be some mechanism to abbreviate the output to be functionally similar to the current fingerprint formats

stg-annon commented 1 year ago

researching some more I don't know if Cromaprint supports recognizing an audio clip from a sample or not, a la Shazam, supporting lookups via audio clips could potentially allow for finding the source scene from a compilation which may be something to look into

Following a discussion with Scruffy on discord it seems any Shazam solution would not be worth exploring, but using Cromaprint as an additional identifier seems viable as it does not work like shazam and will only give you a similarity score to the entire audio track

Another thing to note would be instances where a user changes the format of the video but not the audio, a prime example of this if shifting VR content to "Normal" POV where it would significantly change the PHASH but not an AHASH