Closed adrianlzt closed 3 years ago
Thank you for this nice improvement. It seems the tests need to be updated:
graffiti/graph/elasticsearch_test.go:98:40: cannot use client (type *fakeESClient) as type elasticsearch.ClientInterface in argument to newElasticSearchBackendFromClient:
Ooops. Fixed @lebauce
run functional-tests-backend-orientdb
run functional-tests-backend-orientdb
The Sync method for the ElasticSearch backend was limited to a maximum of 20k documents (10k nodes + 10k edges). This is a clear limiting factor when the skydive database starts to growth over 10k nodes (or edges).
To avoid increasing that fixed limit, we move the Sync method to use the scroll api (https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#scroll-search-results). This API is used to retrieve large numbers of results, using several requests.
A pattern of one consumer and one producer in different goroutines is being used to avoid creating a hige slice with all the results.
Testing with 51k nodes in the live index, using 3.3MB of space, it took around 2" to sync, receiving from ES around 1MB of data (gzipped) in 6 requests to the scroll API.