perone / euclidesdb

A multi-model machine learning feature embedding database
http://euclidesdb.readthedocs.io
Other
631 stars 30 forks source link

About the scalability #23

Open LiberiFatali opened 5 years ago

LiberiFatali commented 5 years ago

Thank you for creating this.

In case of millions or billions of feature vectors, how should we scale? Where is the index stored (in RAM or disk...)? How fast it is when adding a new image and refreshing the index?

perone commented 5 years ago

Hi @LiberiFatali, sorry the delay to answer. The features are stored on disk, for the index it depends which search engine you use. If you use faiss or annoy, they will be stored in memory. Many indexes don't support adding new items without rebuilding the index (the trade-off between fast search and slow update), so rebuilding might be very expensive if you are doing frequent updates and have a lot of items in the database. If you have millions or billions I would recommend using faiss and doing quantization of the index to reduce its size in-memory, there are many guidelines for selecting the proper faiss index depending on your requirements.

LiberiFatali commented 5 years ago

Thanks a lot. I'm researching options for a production environment, so there are more questions.

Is it safe to search/add/remove in EuclidesDB concurrently? How many requests could EuclidesDB handle at the same time?