jobergum / dense-vector-ranking-performance

Performance evaluation of nearest neighbor search using Vespa, Elasticsearch and Open Distro for Elasticsearch K-NN
Apache License 2.0
116 stars 14 forks source link
elasticsearch nearest-neighbors tensors vespa-engine

Performance evaluation of nearest neighbor search using Vespa and Elasticsearch

In this repository we benchmark the performance of the dense vector type in Elastic and compare it with Vespa.ai's tensor field support and tensor operations.

Elastic recently released support for dense and sparse vectors of up to 1024 dimensions ,see
Elastic Blog:Text similarity search with vector fields. The sparse tensor type has later been deprecated. We evaluate the performance of performing nearest neighbor search using euclidean distance with both Vespa and Elasticsearch.

License

This work is published under APACHE 2.0 https://www.apache.org/licenses/LICENSE-2.0

Introduction

Fast searching for the nearest neighbors of a data point in high dimensional vector space is an important problem for many real time applications. For example, in Computer Vision, searching for close data points in high dimensional vector space enables finding the most similar cats or faces in large image datasets. In Information Retrieval, large pre-trained multilingual natural language understanding models like BERT, allows representing text sentences in dense embedding space, where nearest neighbor search could serve as an effective multilingual semantic retrieval function.

In many of these real word applications of (approximate) nearest neighbor search, the search is constrained by real time query filters applied over the data point’s metadata. For example, in E-Commerce search applications with constantly evolving metadata, a search for nearest products for a query in vector space would typically be constrained by product metadata like inventory status and price. There are many open source libraries and algorithms which provide fast approximate (A)NNS, FAISS and Annoy are examples of popular (A)NNS implementations. However these libraries, lacks support for efficient metadata filtering during the search in vector space. Search engines on the other hand, are designed for efficient evaluation of boolean query constraints over indices at scale, but have historically had limited support for storing and indexing vectors or generally, tensor fields.

Datasets

Two datasets are evaluated, datasets which are commonly used when evaluating performance and accuracy of ANN, these datasets are obtained from a great resource on ANN benchmarks https://github.com/erikbern/ann-benchmarks.

Dataset Dimensions Train size Test size Neighbors Distance Download
GIST 960 1,000,000 1,000 100 Euclidean HDF5 (3.6GB)
SIFT 128 1,000,000 10,000 100 Euclidean HDF5 (501MB)

The datasets are split in a train and test, we index the train document corpus and evaluate the query performance using the vectors in the test set as queries. The task we want to accomplish with both engines is to compute the 10 nearest neighbors as measured by the euclidean distance between the document and query vector. Since both engines rank vectors/documents by decreasing relevance/score we use 1/(1+euclidean distance) as our scoring/ranking function.

Configuration and setup

Building on the official docker images of Elasticsearch and Vespa.ai we build two custom docker images with the configuration. Using docker enables us to run the benchmark on the same hardware.

We use vespa-fbench benchmarking client as it's already distributed with the Vespa docker image and is simple to use and supports HTTP POST. Both engines have similar HTTP based APIs for feed and search and we parse the hdf5 formatted datasets to Vespa and Elastic Json formats for both query and feed. The HDF5 data files published on http://ann-benchmarks.com are divided into a train set and a test set, we use index the vectors in the train set and use the test set vectors for benchmarking performance and accuracy metrics.

Feed API

Both Vespa and Elastic have similar HTTP JSON apis for feeding documents. Below snippet is from make-feed.py:

def feed_to_es_and_vespa(data):
  docid,vector = data
  vector = vector.tolist()
  vespa_body = {
    "fields": {
      'vector': {
        'values': vector
      },
      'id': docid
    }
  }
  es_body={
    'id': docid,
    'vector': vector
  }
  response = requests.post('http://localhost:8080/document/v1/test/doc/docid/%i' % docid, json=vespa_body)
  response.raise_for_status()
  response = requests.post('http://localhost:9200/doc/_doc/%i' %docid, json=es_body)
  response.raise_for_status()

Search API

Both Vespa and Elastic have similar HTTP JSON query apis for searching. Below snippet is from make-queries.py which generates the query input to the vespa-fbench HTTP benchmarking client.

#Iterate over test vectors and generate json formatted POST query for ES and Vespa 
for v in test:
  query_vector = v.tolist() 
  vespa_body_ann = {
    'yql': 'select * from sources * where [{"targetNumHits":%i}]nearestNeighbor(vector,vector);' % 10,
    'hits': 10,
    'ranking.features.query(vector)': query_vector, 
    'ranking.profile': 'euclidean-rank',
    'summary': 'id',
    'timeout': '5s' 
  }

  es_script_query = {
    'script_score': {
      'query': {'match_all': {}},
      'script': {
        'source': '1/(1 + l2norm(params.query_vector, doc[\'vector\']))',
        'params': {'query_vector': query_vector}
      }
    }
  }
  es_body={
    'size': 10,
    'timeout': '5s',
    'query': es_script_query
  }
  es_queries.write('/doc/_search\n')
  es_queries.write(json.dumps(es_body) + '\n')
  vespa_queries_ann.write('/search/\n')
  vespa_queries_ann.write(json.dumps(vespa_body_ann) + '\n')

Elastic schema and configuration

index.json:

{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "dynamic": "false",
    "_source": {
      "enabled": "false"
    },
    "properties": {
      "id": {
        "type": "integer"
      },
      "vector": {
        "type": "dense_vector",
        "dims":960 
      }
    }
  }
}

The Elastic service is started using 8GB of heap:

ES_JAVA_OPTS="-Xms8g -Xmx8g"

Vespa schema and configuration

Document definition

doc.sd:

search doc {
  document doc {
    field id type int {
      indexing: summary |attribute
    }

    field vector type tensor<float>(x[960]) {
      indexing: attribute
    }
  }
  document-summary id {
    summary id type int { source: id}
  }
  rank-profile euclidean-rank inherits default {
    first-phase {
      expression: 1/(1 + sqrt(sum(join(query(vector), attribute(vector), f(x,y)((x-y)*(x-y))))))
    }
  }
}

Vespa application package

Benchmark Results

gist-960-euclidean Results

The following results were obtained on an instance with 1 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.30GHz (Ivy Bridge)

single shard with Elastic and threads-per-search equal to one with Vespa

Engine QPS Average Latency (ms) 95P Latency (ms) Recall@10
Elastic 7.6 0.39 2547.42 2664.05 1.0000
Vespa 7.190.14 0.63 1572.29 1737.99 1.0000

The following results were obtained on an instance with 1 x Intel(R) Xeon E5-2680 v3 2.50GHz (Haswell)

single shard with Elastic and threads-per-search equal to one with Vespa

Engine QPS Average Latency (ms) 95P Latency (ms) Recall@10
Elastic 7.6 0.57 1752.74 1850.74 1.0000
Vespa 7.190.14 1.32 756.61 955.63 1.0000

sift-128-euclidean Result

The following results were obtained on an instance with 1 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.30GHz (Ivy Bridge)

single shard with Elastic and threads-per-search equal to one with Vespa

Engine QPS Average Latency (ms) 95P Latency (ms) Recall@10
Elastic 7.6 2.01 496.42 555.34 1.0000
Vespa 7.190.14 4.03 248.29 316.40 1.0000

The following results were obtained on an instance with 1 x Intel(R) Xeon E5-2680 v3 2.50GHz (Haswell)

single shard with Elastic and threads-per-search equal to one with Vespa

Engine QPS Average Latency (ms) 95P Latency (ms) Recall@10
Elastic 7.6 3.29 303.96 337.89 1.0000
Vespa 7.190.14 9.14 109.33 148.90 1.0000

Benchmark parameters

docker run -v $(pwd)/data/:/tmp/queries --net=host --rm --entrypoint /opt/vespa/bin/vespa-fbench docker.io/vespaengine/vespa \
  -P -H "Content-Type:application/json" -q /tmp/queries/elastic/queries.txt -s 180 -n 1 -c 0 -i 20 -o /tmp/queries/result.es.txt localhost 9200 
docker run -v $(pwd)/data/:/tmp/queries --net=host --rm --entrypoint /opt/vespa/bin/vespa-fbench docker.io/vespaengine/vespa \
  -P -H "Content-Type:application/json" -q /tmp/queries/vespa/queries.txt -s 180 -n 1 -c 0 -i 20 -o /tmp/queries/result.vespa.txt localhost 8080 

Parameter explanation :

Reproducing the benchmarks

The benchmark can be reproduced using Dockerfile.vespa and Dockerfile.elastic. Both images are built on the official elasticsearch and vespa docker images. The following reproduces the benchmark using the gist-960-euclidean dataset with 960 dimensions.

Requirements:

Instructions to reproduce benchmark on sift 1M vector data set

Clone, build containers and run.

$ git clone https://github.com/jobergum/dense-vector-ranking-performance.git; cd dense-vector-ranking-performance
$ ./bin/build.sh 
$ ./bin/run.sh
$ wget http://ann-benchmarks.com/gist-960-euclidean.hdf5

Verify that the two docker containers are running:

$ docker ps |egrep "vespa|es"

Deploy configuration and document schema

Vespa

Verify that configuration service is running and returns 200 OK:

$ docker exec vespa bash -c 'curl -s --head http://localhost:19071/ApplicationStatus'

Upload the Vespa application package with document schema:

$ docker exec vespa bash -c '/opt/vespa/bin/vespa-deploy prepare config && \
    /opt/vespa/bin/vespa-deploy activate'

Elastic

Verify that Elastic service is running and returns 200 OK:

$ docker exec es bash -c 'curl -s --head http://localhost:9200/'

Deploy Elastic index schema

$ docker exec es bash -c '/usr/share/elasticsearch/create-index.sh'

Feed data

Both Vespa and Elastic has batch oriented feed api's with higher throughput performance but to keep the dependency list short we opt to use the simplistic HTTP based apis. Feeding

$ python3 ./bin/make-feed.py gist-960-euclidean.hdf5 

Make both engines, merge the segments within the shard for Elastic and flush and merge the memory index for Vespa.

$ curl -s -X POST "http://localhost:9200/doc/_forcemerge?max_num_segments=1"
$ docker exec vespa bash -c '/opt/vespa/bin/vespa-proton-cmd --local triggerFlush'

Run benchmark

$ python3 ./bin/make-queries.py gist-960-euclidean.hdf5 
$ ./bin/do-benchmark.sh 

Check recall

$ python3 ./bin/check-recall.py gist-960-euclidean.hdf5