skydive-project / skydive

An open source real-time network topology and protocols analyzer
https://skydive.network
Apache License 2.0
2.68k stars 404 forks source link

ElasticSearch allow to recover more than 10k documents #2400

Closed adrianlzt closed 3 years ago

adrianlzt commented 3 years ago

The Sync method for the ElasticSearch backend was limited to a maximum of 20k documents (10k nodes + 10k edges). This is a clear limiting factor when the skydive database starts to growth over 10k nodes (or edges).

To avoid increasing that fixed limit, we move the Sync method to use the scroll api (https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#scroll-search-results). This API is used to retrieve large numbers of results, using several requests.

A pattern of one consumer and one producer in different goroutines is being used to avoid creating a hige slice with all the results.

Testing with 51k nodes in the live index, using 3.3MB of space, it took around 2" to sync, receiving from ES around 1MB of data (gzipped) in 6 requests to the scroll API.

lebauce commented 3 years ago

Thank you for this nice improvement. It seems the tests need to be updated:

graffiti/graph/elasticsearch_test.go:98:40: cannot use client (type *fakeESClient) as type elasticsearch.ClientInterface in argument to newElasticSearchBackendFromClient:
adrianlzt commented 3 years ago

Ooops. Fixed @lebauce

lebauce commented 3 years ago

run functional-tests-backend-orientdb

lebauce commented 3 years ago

run functional-tests-backend-orientdb