[Feature] Check if shorter clip is duplicated within longer file

stashapp / stash

An organizer for your porn, written in Go. Documentation: https://docs.stashapp.cc

https://stashapp.cc/

GNU Affero General Public License v3.0

9.32k stars 801 forks source link

[Feature] Check if shorter clip is duplicated within longer file #3706

Open pummer opened 1 year ago

pummer commented 1 year ago

In my library I have some (vintage) files scraped from Web sources. I consider some of these to be "duplicates" because they are just shorter clips of larger files that I also have in the library.

Is there a way to find (in order to eliminate) these files that are only a shorter part (subset) of a longer clip?

I imagine this would be implemented as a part of the existing Duplicate Checker, or else separately as the Clip Checker.

Thanks, love stash and appreciate all dev efforts.

DogmaDragon commented 1 year ago

Stash doesn't analyze the video content. You would need to take almost frame-by-frame video analyzes for something like this. Not really viable as it would be too resource intensive.

Stash pHash implementation (and what Duplicat Checker uses) is limited to taking 25 screenshots of the frames at set times and then generating a hash of that image and using hamming distance for comparsing. And since it takes frames at fixed times it's duration sensitive as any small change it duration would show a different frame.

ghost commented 1 year ago

Related to #1656 and #542.

This could be solved by importing/exporting custom scene relationships from/to stash-box. The relationship in this case could be included in another scene.

stg-annon commented 1 year ago

The only thing I can think of off the top of my head that would be able to do something like this in a way that would be performative might be an audio hash like AcoustID/Chromaprint which would need to be implemented as another fingerprint within stash, although this fingerprinting method is intended for music and I don't really know how well it would work fin this application but I believe it does support matching partial audio clips against other sections of audio