stashapp / stash

An organizer for your porn, written in Go. Documentation: https://docs.stashapp.cc
https://stashapp.cc/
GNU Affero General Public License v3.0
9.09k stars 788 forks source link

[Bug] "Scrape All" does not return matches if remote PHash was distance-matched #2269

Closed peolic closed 2 years ago

peolic commented 2 years ago

Stash Version: v0.12.0-31-g184117ea

When called on a Stash-Box source (StashDB), the "Scrape All" tagger button does not return matches when the PHash was matched via hamming-distance matching, and is not equal to the original PHash sent. Maybe not really a bug, but it's something that caused some confusion on Discord, and looks to me like an oversight.

To reproduce, what I did was:

  1. Take a random PHash fingerprint from StashDB
  2. Modify it to be within a hamming distance of 2 from the original (make sure the modified hash does not exist on the scene).
  3. Convert hex to int and insert into the database as the PHash value for any scene not matched with with a Stash-Box source.
  4. Tagger view, make sure scene is visible, then Scrape All. No match found for the scene
  5. Scrape by Fragment on the scene itself, Match found by PHash.

I debugged this a little, and I believe the reason for this is: Scrape all (ultimately) calls: https://github.com/stashapp/stash/blob/184117ea39132ceb8ee95f4ce44c537261abc578/pkg/scraper/stashbox/stash_box.go#L72

The fingerprints are sent correctly and scenes are received correctly, however, when matching the submitted hashes to the matched scenes' hashes, the PHash is only checked for an exact fingerprint match on any of the returned scenes, which means the distance match (for example, of 2) is never linked to the local scene and the result(s) is never shown.

https://github.com/stashapp/stash/blob/184117ea39132ceb8ee95f4ce44c537261abc578/pkg/scraper/stashbox/stash_box.go#L135-L143

This caused a confusion on Discord, where Scrape All did not match the scene/fingerprint, but Scrape by Fragment did. That's because the latter only has to "deal" with a single scene result - either matched or not matched, and does not need to perform a match based on the fingerprint hash.

peolic commented 2 years ago

POC of a fix for the issue described above. This is very rudimentary - just made to confirm my theory. https://github.com/stashapp/stash/compare/develop...peolic:scrape-all-phash-distance Please, feel free to improve upon it and submit a PR.

floydcg commented 2 years ago

I just made a test build to test this out and can confirm it fixed my issue with matching. I ran a scrape all using the official develop build and came up with 0 matches, but was able to fingerprint match several individually. I then ran the test build I made and it scrape all matched the 3 or 4 I was able to fingerprint match previously.

floydcg commented 2 years ago

distance Just wanted to share the difference changing the distance made on my instance of stash. I now understand why when I would do a scrape all it would spin for a long time sometimes and only show one or 2 matches. Perhaps this could be made into a variable on the settings screen for how much to distance match? I believe this patch was distance of 4? but perhaps make this a user adjustable option rather than a hard coded digit?