Open pozieuto opened 4 years ago
I would instead recommend that an API endpoint be exposed to create.remove a potential pair and that third parties be permitted to write their own phashing.
False negatives should be exceptionally rare, but if they aren't hydru's last step should be a straight compare after the broad phase pHash.
The idea of the perceptual hashing system is to be performant while also being reasonably accurate. Full-image comparisons are prohibitively intensive for large sets of media by comparison. Pumping image data through an API is also not as performant as natively doing hashes and searches along with other maintenance, I would think. The hashes also need to be stored in Hydrus' database for them to be useful in practice.
Existing research: https://github.com/thorn-oss/perception @pozieuto take a look at some of the new algorithms they have.
I end up with a lot of false negatives for some alternate types when using the existing perceptual hash system, so I've been considering how to expand upon it.
My understanding of the current system, based on resources linked by HyDev in the past, is that it does these three things in some order:
The final step is comparing the 64 values against the criterion to assign them a binary value depending on if they're smaller or larger.
Presently, I believe Hydrus is limited to averaging for the size reduction, greyscaling for the value reduction, and averaging again for the criterion determination. But there are other operations that could be done for these steps instead, which might prove more useful in detecting certain types of alternates:
Not all of these combinations would necessarily be valuable, but I believe at least some of these would be effective in practice for certain types of alternates, so having them available as options for advanced searches would be useful. Of course, something very important to consider is that all combinations of these variants would produce a lot of different hashes - far more than is reasonable to store for every image. The searching functionality would need to be expanded upon to make this useful.
Other hash types may also be feasible. My two ideas here are for hue and "features".
For a hue hash: Since the hue is a "circular" value space rather than a line with a discrete start and end, any usual calculations like an "average" would be worthless. Instead, the most "dominant" hue needs to be determined somehow, such that two alternates are more likely to have similar hashes (assuming their dominant hues closely match). Having every pixel in the image add to the score of its hue relative to how saturated or transparent it is (so fully unsaturated or transparent pixels contribute nothing) would produce an approximate frequency distribution of perceived hues, on which some statistical analysis needs to be done to determine a dominant value. What seems most intuitive to me would be to smooth the dataset using kernel density estimation and use the hue at highest point of the resulting curve. Then, imagining the hue spectrum as a "colour wheel", slice it with a straight line through the midpoint with an angle matching that of the tangent at the dominant hue's position on the circle. We now have two equal sections of the hue spectrum that we can use to assign binary values to different segments of the image as before. This division could also be done such that the two points dividing the hue spectrum remain equidistant to the dominant hue but are adjusted such that the two sections of the hue spectrum capture approximately half of the hue frequency distribution (making the two sections most likely represent different amounts of the hue spectrum, by making a "V" shaped slice in the "wheel"). Then for each of the 64 sections of the image, determine the dominant hue (like before), and assign a binary value for that section based on which of the two sections determined earlier of the hue spectrum that its dominant hue falls into. This step would also benefit from some kind of threshold to account for transparency and chroma noise such that if there's hardly any or no colour to the section at all, then it is assigned the binary value corresponding to the section of the hue spectrum that does not include the image's dominant hue.
For a "feature" hash: Before performing any reductions, run an edge detection or phase stretch transformation algorithm on the image to produce an image of its "features", and then do a specialized perceptual hash of that.
Currently all of this is theoretical, and I haven't done any implementations or tests to determine the applicability of these different ideas in practice.