Closed yruen closed 2 years ago
Perceptual hashing implemented in https://github.com/yruen/UpscaleTesting/commit/1cfcb25651601731e3ef1e83716cff6d3e36f077
Needs optimization, checks the hash of every file over and over again so adds time and resource consumption counter (with hashing differences at <10) said 934 pairs of files are identical, ideally should be 1/3 or less since there's multiple of the same texture, just in different resolutions
./Textures/tex1_64x64_068992D9C9269303_13.png is very similar to ./Textures/tex1_32x32_3F957D5F523C08D6_13.png ./Textures/tex1_64x64_068992D9C9269303_13.png is very similar to ./Textures/tex1_32x32_6BD5FF7BCAD0D1AA_13.png ./Textures/tex1_64x64_068992D9C9269303_13.png is very similar to ./Textures/tex1_64x64_18BB22DB187BB0AC_13.png ./Textures/tex1_64x64_068992D9C9269303_13.png is very similar to ./Textures/tex1_128x128_53D525381E1A8F56_13.png ./Textures/tex1_64x64_068992D9C9269303_13.png is very similar to ./Textures/tex1_128x128_D6A9069D4928BC8E_13.png ./Textures/tex1_64x64_068992D9C9269303_13.png is very similar to ./Textures/tex1_256x256_C9734190EF9AA10A_13.png
tex1_256x256_C9734190EF9AA10A_13.png checked again for example:
latest commit improves this but still has issues: can mostly only group higher resolution textures (>=32x32)
Add 2 or 3 pass algorithm options:
First pass pretty close to/or 0 (just change current parameter options for difference=0)
Add another function to split code that does something like this:
Get folders from os.listdir (or os.scandir since it's apparently faster)
Inside those folders get the smallest image
Hash smallest image in folder (with less complex hashing) and compare hash with all images in parent folder, cache file or something that keeps track of hashes for files might be useful in this case
Expected result: Get very high or exact accuracy from first batch of hashing on high res image and lower the strictness while keeping high accuracy?
Possible issue: there could be a lot more folders since low res images could match with another low res image, possibly a good idea to separate the images by their default first or exclusively low res ones like 16x16
These images are super annoying, their difference value is extremely high despite obviously being the same one but different resolution, little other images are like this
For comparison, these two images just fit under the difference value threshold
whash function in imagehash seems to group more images together with some odd behavior at the moment (only matching one pair of images in a set of 4, creating 2 folders with the same looking textures) and lower accuracy
Closing because it's probably impossible to get perfect pairing
Current idea for achieving this: -CV2 image comparing (could be slow) -Perceptual hashing (slow at first but then speeds up due to caching?)