hydrusnetwork / hydrus

A personal booru-style media tagger that can import files and tags from your hard drive and popular websites. Content can be shared with other users via user-run servers.
http://hydrusnetwork.github.io/hydrus/
Other
2.39k stars 158 forks source link

Non pixel dup in the pixel dup queue #1158

Open alidan opened 2 years ago

alidan commented 2 years ago

Hydrus version

v487

Operating system

Windows other (specify in comments)

Install method

Installer

Install and OS comments

7 ultimate 64

Bug description and reproduction

I have in total around 2.3 million dups I have 6.94 million potential files I have 100% exact 100% very similar 9% similar and 22 files speculative

in a 250 batch of per pixel duplicates, I had one set that was not per pixel or even close, not sure why that showed up in it, the program even knew they weren't per pixel.

I have 300k pixel dups, and my current subset of that is 14k~ client_2022-06-02_01-08-01

not sure how to reproduce this or if this was a one off, going to continue to parse 2 more 250 image queues and see if something crops up.

Log output

No response

alidan commented 2 years ago

had another one show up in the pixel dup, it seems like its around 1:1000 for this to happen

hydrusnetwork commented 2 years ago

Thank you for this report. I've had a couple like this now, and while some were legacy issues that were fixed over time with scheduled maintenance, we still seem to have some actual search problems where things aren't lining up correctly.

If you hit up database->file maintenance->manage scheduled jobs, I assume you don't have a whole ton of jobs still waiting to go, right? I scheduled everyone to get a pixel hash regeneration for all files a while ago, and if you have millions of files, they may have never finished.

There is still a disconnect between the database's idea of duplicate pixel files and the filter's. They actually use some different code. I am planning to unify them soon, so maybe I will accidentally fix this. I will investigate this more as I work in this area, and I hope to have a definitive fix or explanation in time.

alidan commented 2 years ago

yea I have millions of images that dup will deal with, but nothing in maintenance, I think when that redo of the pixel has came though I brute forced it due to how useful it would be to be able to mindlessly parse.