Closed NelsonBurton closed 4 years ago
Hi Nelson, before we created this plugin we used a script to run cosine-similarity. When we switched to the plugin we gained more than an order of magnitude in performance! A script runs in the Elasticsearch level, while this plugin runs inside the internal, highly optimized Luciene level.
80ms was achieved using 4 m4.10xlarge machines and using a cluster with about 50 shards. as a rule of thumb - the more shards you have - latency will be better. But note that throughput will decline. Why? Elasticsearch allocates a CPU core per shard. More shards = more active cores when processing a single search query. less shards - means less cores per query - means we can do more concurrent queries
I see, very interesting, thank you for the information!
What level of throughput can you serve with this setup? My estimate would be with 4 m4.10xl's , then: 160 virtual Cores, 80 real cores; with 50 shards, roughly 2-3 queries can be served simultaneously, if each query takes 80ms, then about 25-50 Queries Per Second?
Also, do you find the EBS storage on m4 instances provides any IO bottleneck? I am wondering if reading from an SSD could improve performance slightly as well.
As an aside, there may be a small performance optimization opportunity in the plugin. For dot product, the plugin could pre-compute the magnitude of each vector before searching, for both the target and comparision vectors, at search and indexing time respectively, and store it in the N+1 array, so a 50 dimesional vector would have 51 values, where the 51st value is the magnitude. That way magnitude of each vector need not be recomputed for each distance calculation.
Your calculations are correct. We actually reduced the number of shards to 10 - gaining throughput. We were able to reduce latency by using k-means to divide the corpus to clusters. When we query we query only the X nearest clusters. X is determined beforehand tuning it to provide KNN accuracy of 98%
regarding magnitude - you're right. internally we preferred to normalize all our vectors and use dot-product instead of cosine similarity, thus bypassing the magnitude calculation
Ah! K-means is a great idea for reducing the number of calculations further, thanks for the tip and answering my questions, very helpful!
Hello, I am quite impressed by your 80ms latency for 64 dimensional floats and ~4 million items. What does your infrastructure look like? Does this include parellization using sharding? What hardware type are you using? Is 80ms on a single machine?
I have a similarly sized corpus, 5 million documents, 50 dimensional floats. I wrote a KNN function using a script in Elasticsearch’s Painless language, and am getting about 13 seconds to score by nearest neighbors on a single AWS i3.4xl EC2 instance.
I am curious if using a plugin rather than Painless will give me significantly better performance... but was curious how you ended up with good numbers, before I invest in the plugin approach.