mazzzystar / Queryable

Run OpenAI's CLIP and Apple's MobileCLIP model on iOS to search photos.
https://queryable.app
MIT License
2.61k stars 416 forks source link

Photo similarity look a little bit low #35

Closed yujinqiu closed 7 months ago

yujinqiu commented 7 months ago

Hi there, I'm try to print the top N similarity with the following output, sim is less than 0.5, but the result for human is right. I'm not sure weather it's a bug or not.

photoID: E223B067-975A-49AF-AC42-6387EFC2C73D/L0/001, sim: 0.260636
photoID: E41F3701-3434-44E8-B40C-F7708C43EBA0/L0/001, sim: 0.257243
photoID: C0BD9F08-99D0-42E6-A1FA-275B6FCA4141/L0/001, sim: 0.256340
photoID: 287F7C32-D2B3-43D9-AB2C-A24E26CA4CD4/L0/001, sim: 0.251475
photoID: EDBFBAF6-4503-48FA-97DF-848A1F74C30A/L0/001, sim: 0.251110
photoID: AC47FDB3-4C68-4142-B0B5-40631EE0D41A/L0/001, sim: 0.249579
photoID: EB8D302D-C742-43E6-9ED0-156D669F1814/L0/001, sim: 0.249101
photoID: AF3ABD8F-4A75-4178-95C4-A7D218123BFF/L0/001, sim: 0.248550
photoID: 3412635C-AD4C-49FB-82BF-05552F964DAF/L0/001, sim: 0.247968
photoID: 78DBA56B-B2E1-4125-B746-66EF3319CC89/L0/001, sim: 0.247247
mazzzystar commented 7 months ago

I've attempted to perform the normalization in a manner similar to the PyTorch version and obtained the same results, so I believe the issue does not lie with the normalization process.

I guess it's because the dim(512) is high, even if there's a perfect match within a low-dimensional manifold, the overall average similarity could still appear low due to the effect of the remaining dimensions.

mazzzystar commented 7 months ago

The sim score between image and image is normal, so I think the metric is correct.