Open amitbaer opened 4 years ago
Hi @amitbaer , I hit the same issue as you experienced. The solution you mention effectively solves the sniffing issue but there is then a parsing error with the version number.
ERROR elasticsearch/client.go:397 (*Client).Start.func1 skydive-skydive-analyzer-8649774f78-g6hrh: Elasticsearch not available: Unable to parse the version:
The issue is that when Skydive gets the version at L191, it uses the originaly provided URL c.url.String()
instead of the URL after parsing by the esConfig
module at L158 .
I believe that according to olivere/elastic, the fix at L191 should be like
vt, err := esClient.ElasticsearchVersion(esConfig.URL)
Edit: to be more clear, because we pass parameter to the ElasticSearch through the query parameter of the SKYDIVE_STORAGE_ELASTICSEARCH_HOST, the query requested for the version is
curl http://ES_IP:ES_PORT/?sniff=false
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"request [/] contains unrecognized parameter: [sniff]"}],"type":"illegal_argument_exception","reason":"request [/] contains unrecognized parameter: [sniff]"},"status":400}
instead of
curl http://ES_IP:ES_PORT
{
"name" : "elasticsearch-client-777bcd8688-ms4xm",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "40G6CUOPQWaMWmULFviXow",
"version" : {
"number" : "5.6.4",
"build_hash" : "8bbedf5",
"build_date" : "2017-10-31T18:55:38.105Z",
"build_snapshot" : false,
"lucene_version" : "6.6.1"
},
"tagline" : "You Know, for Search"
}
@amitbaer @Sparika Could you please give this a try : https://github.com/skydive-project/skydive/pull/2138 ? Thanks !
Hi @lebauce,
As I'm using Helm to deploy Skydive, I was not able to use the fix in my exact scenario, however I replaced my helm deployment by the built binary. The fix solves the reported issue as I'm able to connect to the distant elasticsearch.
However, I'm hitting another issue! When the binary connects to the ES and pushes some data it kind of corrupts the ES indices as queries to the Skydive indices now return an error. Before connection the indices were OK.
Before:
curl http://ES_IP:ES_PORT/skydive_topology_live_v12/_search | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 639 100 639 0 0 6198 0 --:--:-- --:--:-- --:--:-- 6264
{
"took": 100,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "skydive_topology_live_v12",
"_type": "graph_element",
"_id": "c171d2ee-d356-5a2d-7db1-a4a3cfd4c9d1",
"_score": 1,
"_source": {
"Origin": "analyzer.skydive-skydive-monitoring-analyzer-85567d94b9-7hmmb",
"ArchivedAt": 1580204725747,
"Revision": 1,
"CreatedAt": 1580205203621,
"Metadata": {
"Type": "device",
"Probe": "fabric",
"Name": "TOR1"
},
"Host": "skydive-skydive-monitoring-analyzer-85567d94b9-7hmmb",
"_Type": "node",
"ID": "c171d2ee-d356-5a2d-7db1-a4a3cfd4c9d1",
"DeletedAt": 1580204725747,
"UpdatedAt": 1580205203621
}
}
]
}
}
After:
curl http://ES_IP:ES_PORT/skydive_topology_live_v12/_search | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 215 100 215 0 0 118k 0 --:--:-- --:--:-- --:--:-- 209k
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "maxConcurrentShardRequests must be >= 1"
}
],
"type": "illegal_argument_exception",
"reason": "maxConcurrentShardRequests must be >= 1"
},
"status": 400
}
Error when using a k8s probe on the binary analyzer:
SKYDIVE_STORAGE_MYELASTICSEARCH_DISABLE_SNIFFING=true SKYDIVE_STORAGE_MYELASTICSEARCH_HOST=http://ES_IP:ES_PORT sudo ./skydive-nosniff analyzer -c skydive.yml.default
2020-01-28T09:21:46.898Z INFO analyzer/analyzer.go:42 glob..func1 n0: Skydive Analyzer 0.18.0-36982a150371 starting...
[...]
2020-01-28T09:21:48.023Z INFO analyzer/storage.go:60 newGraphBackendFromConfig n0: Using myelasticsearch (driver elasticsearch) as graph storage backend
2020-01-28T09:21:48.025Z INFO etcd/election.go:102 (*MasterElector).start n0: starting as the master for /master-analyzer-es-graph-flush: n0
2020-01-28T09:21:48.049Z INFO etcd/election.go:102 (*MasterElector).start n0: starting as the master for /master-analyzer-es-rolling-index:35534fa2: n0
2020-01-28T09:21:48.050Z INFO elasticsearch/client.go:229 (*Client).start n0: client started for skydive_topology_live, skydive_topology_archive
2020-01-28T09:21:48.050Z INFO graph/elasticsearch.go:507 (*ElasticSearchBackend).flushGraph n0: Flush graph elements
2020-01-28T09:21:48.278Z ERROR graph/elasticsearch.go:524 (*ElasticSearchBackend).OnStarted n0: Unable to flush graph element: elastic: Error 400 (Bad Request): maxConcurrentShardRequests must be >= 1 [type=illegal_argument_exception]
Well, apparently I got it working by deleting the ES client pods. After I reported on the issue, they went into crashloops for another reasons, so I deleted both ES pods and once the new ones were up, the data were pushed without any problem by both Skydive analyzers to the ElasticSearch.
edit: after a few minutes, the situation is buggy again. I will investigate, it may not be related to distant ES but maybe due to two skydive pushing to the same ES. Should I open a new issue ?
Just a few additional remarks: The ES client library you use specifically parse the URL for several parameters and returns a cleaned up URL in the configuration object. I would still suggest that you use this URL from the ES client config rather than the original one to allow other parameters in case of need.
Two such parameters are the username and password that are transmitted as http://USERNAME:PASSWORD@ES_IP:ES_PORT. We found about them looking at the ES client code, and we can use them without causing issue with Skydive. However, it may be useful to document these in the example config files or somewhere else.
Thanks for the quick fix! Do you have an ETA for the next version including the fix?
@Sparika Thanks for testing. I agree, we should use the URL parsed by the ES library. We should be able to release a next version pretty soon, probably at the end of next week
I am trying to use a dockerized external ElasticSearch as the flow/topology backend. For this to work, i needed to disable sniffing, which is used by default in the olivere/elastic ES client. When sniffing is enabled i am getting an error "No ElasticSearch Node Available". This is explained in the following issue in the client repo : https://github.com/olivere/elastic/issues/312.
To disable sniffing, i am adding a URL parameter to the ES endpoint supplied in the skydive YAML file. The final configuration is: data: SKYDIVE_ANALYZER_FLOW_BACKEND: myelasticsearch SKYDIVE_ANALYZER_TOPOLOGY_BACKEND: myelasticsearch SKYDIVE_STORAGE_MYELASTICSEARCH_DRIVER: elasticsearch SKYDIVE_STORAGE_MYELASTICSEARCH_HOST::9200?sniff=false
This solves the first issue (Client is now successfully built), but creates a new one - After the ES client is built, the ES version is being polled - this fails with any additional parameter added to the ES endpoint (including, but not limited to the sniff parameter)