Open glookka opened 8 months ago
In Slack we discussed how much RAM it takes now (w/o any quantization):
I've made test with 550000 records.
Here're results for .spknn files size (and same RAM usage) with different parameters.
vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='50' hnsw_ef_construction='600'
968МB
vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='10' hnsw_ef_construction='600'
725MB
vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='10' hnsw_ef_construction='600' fast_fetch='0'
683MB
vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='10' hnsw_ef_construction='64' fast_fetch='0'
355MB
vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='10' hnsw_ef_construction='32' fast_fetch='0'
355MB
Interesting post from Cohere https://www.linkedin.com/posts/reimersnils_%3F%3F%3F%3F%3F%3F%3F%3F-%3F%3F%3F%3F%3F%3F-%3F%3F-%3F%3F%3F-ugcPost-7214251130984296448-fDs2/?utm_source=share&utm_medium=member_desktop about using Faiss' IVFPQ
Some info on quantization methods https://neuml.hashnode.dev/all-about-vector-quantization
Currently, Manticore uses the HNSW index over floats for its KNN search implementation. That might lead to excessive memory consumption, as all HNSW indexes must be loaded into RAM. One way to improve this is to quantize float vectors into word/byte vectors and use an HNSW index over such vectors. This will only affect memory consumption (and KNN search accuracy), as the original float vectors will still be stored.