perone / euclidesdb

A multi-model machine learning feature embedding database
http://euclidesdb.readthedocs.io
Other
633 stars 31 forks source link

How to sort entire database on image? #22

Closed radao closed 2 years ago

radao commented 5 years ago

I see the main sorting function exposed is db.find_similar_image which takes as input an image, a model, and a topk parameter and returns the topk most similar images. If we would like to sort the entire database on a query image, would setting topk to the number of images added to the database for a particular model space work or would this be prohibitively slow for large databases (500k+ images)? I plan on testing this myself but wanted to see if you did similar tests already.

perone commented 5 years ago

Hi @radao, I haven't tested that scenario, but since EuclidesDB has to return only IDs, distances and model name, it should work fine, it might be a little slow only if you have a very slow connection. Please let me know later how the test was because I'm curious as well, thanks for testing that !