stashapp / stash

An organizer for your porn, written in Go. Documentation: https://docs.stashapp.cc
https://stashapp.cc/
GNU Affero General Public License v3.0
8.79k stars 778 forks source link

[Feature] Add "Duplicated (Stash ID)" filter #3441

Open SpedNSFW opened 1 year ago

SpedNSFW commented 1 year ago

Is your feature request related to a problem? Please describe. With the "Duplicated (phash)" filter, we can merge scenes that have a duplicate phash, but this doesn't account for all duplicates of a scene. Having a duplicated stash ID filter would give us another option.

Describe the solution you'd like A "Duplicated (Stash ID)" filter for scenes that would only show scenes that share a Stash ID with one or more other scenes. The working SQL for that is:

SELECT ssi.stash_id, sc.title
FROM scenes sc
  JOIN scene_stash_ids ssi
    ON sc.id = ssi.scene_id
  GROUP BY ssi.stash_id
  HAVING COUNT(*) > 1;

Unfortunately I don't know GraphQL (and don't really have the time to learn it) otherwise I would implement this myself. In theory it should be pretty simple to add.

Describe alternatives you've considered An alternative I've thought about, that may be better in the long run, is to just have a "Duplicated" filter that brings up a modal similar to the tag filter, with options for each scene detail. This would render both this request and the current "Duplicated (phash)" filters obsolete. With this idea, you could select multiple fields and all would have to match.

JanJastrow commented 1 year ago

I did the exact same thing four days ago. Copy the DB and did a local SQLite query like above to figure this out. Would be a useful feature.

STBYRUD commented 1 year ago

That would be incredibly useful!

ben-ba commented 10 months ago

Hint: Some scenes have multiple parts with the same stashid!

becks0815 commented 8 months ago

Hint: Some scenes have multiple parts with the same stashid!

Just like the current duplicate finder also shows videos which aren't duplicates, but very similar. So please leave the sorting/deleting to the user.

I also would like to have a function like the duplicate file finder, but based on the stash id of identified files instead of a similar phash.

scruffynerf commented 8 months ago

I wrote code to add a tag for this. Far better than a dedicated filter, and it works today. I'll get the code into the plugin repo and comment back when I do.

wirbelsaeure commented 8 months ago

Doesn't this actually belong in the "Scene Duplicate Checker" tool? I guess users would expect an additional function there to list duplicate scenes, grouped by StashID.

https://github.com/stashapp/stash/issues/4041

stg-annon commented 8 months ago

Its been discussed before, the thing about duplicate StashIDs is that it relies on the scenes being tagged correctly to be sure of an actual duplicate the tagging process uses PHash anyway so using PHash to detect dupes in the first place has less issues than StashID because the scene could have been improperly tagged with that ID

d3f113 commented 8 months ago

Its been discussed before, the thing about duplicate StashIDs is that it relies on the scenes being tagged correctly to be sure of an actual duplicate the tagging process uses PHash anyway so using PHash to detect dupes in the first place has less issues than StashID because the scene could have been improperly tagged with that ID

You also have the option to have a low precision with phash, mixing scenes together, which aren’t the same as well. In both cases, low phash correlation and stash id correlation would need a user deciding at the end, it shouldn’t be done automatically.

The problem is that phash doesn’t find many duplicates, probably because it uses just frames in a specific timeframe, instead of „information“ rich unique frames. This could be solved with humans identifying those via stash id or an improved phash. The later is much harder.

becks0815 commented 8 months ago

I understand the issues with how this works, but beside using this filter to find duplicates I also use it to identify scens which have been tagged in a wrong way. So it's useful for at least two jobs for data cleansing - duplicates, and errors.