Open stg-annon opened 1 year ago
researching some more I don't know if Cromaprint supports recognizing an audio clip from a sample or not, a la Shazam, supporting lookups via audio clips could potentially allow for finding the source scene from a compilation which may be something to look into
Following a discussion with Scruffy on discord it seems any Shazam solution would not be worth exploring, but using Cromaprint as an additional identifier seems viable as it does not work like shazam and will only give you a similarity score to the entire audio track
Another thing to note would be instances where a user changes the format of the video but not the audio, a prime example of this if shifting VR content to "Normal" POV where it would significantly change the PHASH but not an AHASH
Is your feature request related to a problem? Please describe.
Using a scenes audio to identify the file has been discussed numerous times across discord and a few issues, I could not find a dedicated issue to track any discussion on the topic so this issue is for that
Describe the solution you'd like
Add an audio hash fingerprint type to scenes (AHash)
The existing Chromaprint library can be used for this, which is the same utility used in MusicBrainz the inspiration of stash-box or for our purposes go-fingerprint may be more suitable
This is useful in the cases where a scene has the same video but different audio so the PHash will have a strong correlation but the scenes are actually different, an example I am aware of is a Spanish dub's of popular MindGeek scenes, having the audio hash would allow users to identify these discrepancies
Describe alternatives you've considered
3rd Party plugin to generate these hashes and store them in a detached database this is doable but not ideal and prevents the use of these fingerprints across stash-boxes and by most of the Stash userbase
Additional context
I have briefly experimented with the Chromaprint library in Python and it appears to work well for this purpose, there would likely need to be some mechanism to abbreviate the output to be functionally similar to the current fingerprint formats