Open archoversight opened 2 years ago
Thanks for bringing this up.
The process of sniffing is described in the wiki. We're relying on the data returned by Elasticsearch itself. Out of the top of my head, I can't remember how ES internally tells to use http vs. https, but I think it's that the Nodes Info API returns a https
key instead of a http
key in the structure.
Blindly using the scheme of the initial request is not the right choice, because the cluster might be configured differently.
I have to check if it works correctly in your usage scenario with https
.
EDIT: Maybe you can send the output of curl 'https://elastic01.example.com:9200/_nodes/http?pretty=true'
?
@olivere
{
"_nodes" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"cluster_name" : "metrics-alerting",
"nodes" : {
"_Xw7OJ0KRFW5gMWjLEAqpg" : {
"name" : "elastic02",
"transport_address" : "10.110.40.10:9300",
"host" : "elastic02.test.example.internal",
"ip" : "10.110.40.10",
"version" : "7.16.2",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "2b937c44140b6559905130a8650c64dbd0879cfb",
"roles" : [
"data",
"data_cold",
"data_content",
"data_frozen",
"data_hot",
"data_warm",
"ingest",
"master",
"ml",
"remote_cluster_client",
"transform"
],
"attributes" : {
"ml.machine_memory" : "33731575808",
"ml.max_open_jobs" : "512",
"xpack.installed" : "true",
"ml.max_jvm_size" : "16869490688",
"transform.node" : "true"
},
"http" : {
"bound_address" : [
"0.0.0.0:9200"
],
"publish_address" : "elastic02.test.example.internal/10.110.40.10:9200",
"max_content_length_in_bytes" : 104857600
}
},
"mn1VzKM4SVGB3Q93LWUj3g" : {
"name" : "elastic03",
"transport_address" : "10.110.40.8:9300",
"host" : "elastic03.test.example.internal",
"ip" : "10.110.40.8",
"version" : "7.16.2",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "2b937c44140b6559905130a8650c64dbd0879cfb",
"roles" : [
"data",
"data_cold",
"data_content",
"data_frozen",
"data_hot",
"data_warm",
"ingest",
"master",
"ml",
"remote_cluster_client",
"transform"
],
"attributes" : {
"ml.machine_memory" : "33731575808",
"ml.max_open_jobs" : "512",
"xpack.installed" : "true",
"ml.max_jvm_size" : "16869490688",
"transform.node" : "true"
},
"http" : {
"bound_address" : [
"0.0.0.0:9200"
],
"publish_address" : "elastic03.test.example.internal/10.110.40.8:9200",
"max_content_length_in_bytes" : 104857600
}
},
"cJRANNsCRfqQhTKORa0kaw" : {
"name" : "elastic01",
"transport_address" : "10.110.40.6:9300",
"host" : "elastic01.test.example.internal",
"ip" : "10.110.40.6",
"version" : "7.16.2",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "2b937c44140b6559905130a8650c64dbd0879cfb",
"roles" : [
"data",
"data_cold",
"data_content",
"data_frozen",
"data_hot",
"data_warm",
"ingest",
"master",
"ml",
"remote_cluster_client",
"transform"
],
"attributes" : {
"ml.machine_memory" : "33731575808",
"xpack.installed" : "true",
"transform.node" : "true",
"ml.max_open_jobs" : "512",
"ml.max_jvm_size" : "16869490688"
},
"http" : {
"bound_address" : [
"0.0.0.0:9200"
],
"publish_address" : "elastic01.test.example.internal/10.110.40.6:9200",
"max_content_length_in_bytes" : 104857600
}
}
}
}
These are docker systems configured using the flags provided in this docker-compose.yml
: https://github.com/elastic/stack-docs/blob/main/docs/en/getting-started/docker/docker-compose.yml
None of the nodes are available over anything but HTTPS, this is an attempt to reach them over HTTP:
# curl http://elastic01.test.example.internal:9200/_nodes/http?pretty=true -k -u elastic -vvv
Enter host password for user 'elastic':
* Trying 10.110.40.6...
* TCP_NODELAY set
* Connected to elastic01.test.example.internal (10.110.40.6) port 9200 (#0)
* Server auth using Basic with user 'elastic'
> GET /_nodes/http?pretty=true HTTP/1.1
> Host: elastic01.test.example.internal:9200
> Authorization: Basic [masked]
> User-Agent: curl/7.58.0
> Accept: */*
>
* Empty reply from server
* Connection #0 to host elastic01.test.example.internal left intact
curl: (52) Empty reply from server
After spending some time spelunking through the official elasticsearch-py
and elasticsearch-transport-py
packages, which are the official Python packages from Elastic, it takes the initial node provided by the user and sticks it into a NodeConfig:
Then when it has sniffed the node, it calls:
Which has:
meta.node.replace(host=host, port=port)
Which just replaces the host
and port
values in the original NodeConfig
it used to do the sniffing (meta.node
is the NodeConfig
for the node that was used to perform the request to the API).
This would mean that node.scheme
would be unchanged, and continue to be https
if the initial NodeConfig
was also https
.
Unfortunately I can't find any official documentation for how to implement sniffing, but it seems to me that scheme
in this case is expected to be the same as the original URL that was provided to kick off the sniffing in the first place.
The initial URL string provided by the user is parsed here:
A lot has happened in this area, it seems. I've not heard of any problems from other users in at least the last two major versions. But maybe people disable this by default, as is the default in the official drivers as well.
Anyway, the official Go driver uses this package as a transport, and uses this algorithm to do node discovery.
I will have to review the changes.
EDIT: typo.
Sorry, I reviewed the Python packages as I was more familiar with Python.
But the go driver is a lot less complicated!
https://github.com/elastic/elastic-transport-go/blob/main/elastictransport/discovery.go#L125
sets the scheme to whatever the scheme is for the first URL provided to that function, which would be HTTPS in our case.
I will see if I can pick that up for the next release. Thanks for picking this up.
Which version of Elastic are you using?
[x] elastic.v7 (for Elasticsearch 7.x) [ ] elastic.v6 (for Elasticsearch 6.x) [ ] elastic.v5 (for Elasticsearch 5.x) [ ] elastic.v3 (for Elasticsearch 2.x) [ ] elastic.v2 (for Elasticsearch 1.x)
Please describe the expected behavior
telegraf uses Elastic and allows for sniffing, however it does not call
.setScheme
here:https://github.com/influxdata/telegraf/blob/master/plugins/outputs/elasticsearch/elasticsearch.go#L191-L286
However this shouldn't matter because the initial url when we want to sniff is set correctly with a scheme (
https://elastic01.example.com:9200
). The same scheme should be used when getting the results back from sniffing, right now it is set toc.Scheme
here:https://github.com/olivere/elastic/blob/release-branch.v7/client.go#L1006
Which means that it uses whatever is in the config (with a default value of
http
), and then it fails entirely because it can't connect to any of the sniffed nodes because they have the wrong URL (http instead of https). It also seems like the sniffed entries overwrite the provided URL's.NewClient
should probably try to be a little smarter and assume the scheme that it is provided when initially sniffing.Please describe the actual behavior
It fails to connect to the Elastic Search cluster even though the initial URL is perfectly valid and sniffing has returned valid data.
Any steps to reproduce the behavior?
N/A