olivere / elastic

Deprecated: Use the official Elasticsearch client for Go at https://github.com/elastic/go-elasticsearch
https://olivere.github.io/elastic/
MIT License
7.43k stars 1.15k forks source link

Sniff support should set scheme to the same as from the initial provided list of URL's #1569

Open archoversight opened 2 years ago

archoversight commented 2 years ago

Which version of Elastic are you using?

[x] elastic.v7 (for Elasticsearch 7.x) [ ] elastic.v6 (for Elasticsearch 6.x) [ ] elastic.v5 (for Elasticsearch 5.x) [ ] elastic.v3 (for Elasticsearch 2.x) [ ] elastic.v2 (for Elasticsearch 1.x)

Please describe the expected behavior

telegraf uses Elastic and allows for sniffing, however it does not call .setScheme here:

https://github.com/influxdata/telegraf/blob/master/plugins/outputs/elasticsearch/elasticsearch.go#L191-L286

However this shouldn't matter because the initial url when we want to sniff is set correctly with a scheme (https://elastic01.example.com:9200). The same scheme should be used when getting the results back from sniffing, right now it is set to c.Scheme here:

https://github.com/olivere/elastic/blob/release-branch.v7/client.go#L1006

Which means that it uses whatever is in the config (with a default value of http), and then it fails entirely because it can't connect to any of the sniffed nodes because they have the wrong URL (http instead of https). It also seems like the sniffed entries overwrite the provided URL's.

NewClient should probably try to be a little smarter and assume the scheme that it is provided when initially sniffing.

Please describe the actual behavior

It fails to connect to the Elastic Search cluster even though the initial URL is perfectly valid and sniffing has returned valid data.

Any steps to reproduce the behavior?

N/A

olivere commented 2 years ago

Thanks for bringing this up.

The process of sniffing is described in the wiki. We're relying on the data returned by Elasticsearch itself. Out of the top of my head, I can't remember how ES internally tells to use http vs. https, but I think it's that the Nodes Info API returns a https key instead of a http key in the structure.

Blindly using the scheme of the initial request is not the right choice, because the cluster might be configured differently.

I have to check if it works correctly in your usage scenario with https.

EDIT: Maybe you can send the output of curl 'https://elastic01.example.com:9200/_nodes/http?pretty=true'?

archoversight commented 2 years ago

@olivere

{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "metrics-alerting",
  "nodes" : {
    "_Xw7OJ0KRFW5gMWjLEAqpg" : {
      "name" : "elastic02",
      "transport_address" : "10.110.40.10:9300",
      "host" : "elastic02.test.example.internal",
      "ip" : "10.110.40.10",
      "version" : "7.16.2",
      "build_flavor" : "default",
      "build_type" : "docker",
      "build_hash" : "2b937c44140b6559905130a8650c64dbd0879cfb",
      "roles" : [
        "data",
        "data_cold",
        "data_content",
        "data_frozen",
        "data_hot",
        "data_warm",
        "ingest",
        "master",
        "ml",
        "remote_cluster_client",
        "transform"
      ],
      "attributes" : {
        "ml.machine_memory" : "33731575808",
        "ml.max_open_jobs" : "512",
        "xpack.installed" : "true",
        "ml.max_jvm_size" : "16869490688",
        "transform.node" : "true"
      },
      "http" : {
        "bound_address" : [
          "0.0.0.0:9200"
        ],
        "publish_address" : "elastic02.test.example.internal/10.110.40.10:9200",
        "max_content_length_in_bytes" : 104857600
      }
    },
    "mn1VzKM4SVGB3Q93LWUj3g" : {
      "name" : "elastic03",
      "transport_address" : "10.110.40.8:9300",
      "host" : "elastic03.test.example.internal",
      "ip" : "10.110.40.8",
      "version" : "7.16.2",
      "build_flavor" : "default",
      "build_type" : "docker",
      "build_hash" : "2b937c44140b6559905130a8650c64dbd0879cfb",
      "roles" : [
        "data",
        "data_cold",
        "data_content",
        "data_frozen",
        "data_hot",
        "data_warm",
        "ingest",
        "master",
        "ml",
        "remote_cluster_client",
        "transform"
      ],
      "attributes" : {
        "ml.machine_memory" : "33731575808",
        "ml.max_open_jobs" : "512",
        "xpack.installed" : "true",
        "ml.max_jvm_size" : "16869490688",
        "transform.node" : "true"
      },
      "http" : {
        "bound_address" : [
          "0.0.0.0:9200"
        ],
        "publish_address" : "elastic03.test.example.internal/10.110.40.8:9200",
        "max_content_length_in_bytes" : 104857600
      }
    },
    "cJRANNsCRfqQhTKORa0kaw" : {
      "name" : "elastic01",
      "transport_address" : "10.110.40.6:9300",
      "host" : "elastic01.test.example.internal",
      "ip" : "10.110.40.6",
      "version" : "7.16.2",
      "build_flavor" : "default",
      "build_type" : "docker",
      "build_hash" : "2b937c44140b6559905130a8650c64dbd0879cfb",
      "roles" : [
        "data",
        "data_cold",
        "data_content",
        "data_frozen",
        "data_hot",
        "data_warm",
        "ingest",
        "master",
        "ml",
        "remote_cluster_client",
        "transform"
      ],
      "attributes" : {
        "ml.machine_memory" : "33731575808",
        "xpack.installed" : "true",
        "transform.node" : "true",
        "ml.max_open_jobs" : "512",
        "ml.max_jvm_size" : "16869490688"
      },
      "http" : {
        "bound_address" : [
          "0.0.0.0:9200"
        ],
        "publish_address" : "elastic01.test.example.internal/10.110.40.6:9200",
        "max_content_length_in_bytes" : 104857600
      }
    }
  }
}

These are docker systems configured using the flags provided in this docker-compose.yml: https://github.com/elastic/stack-docs/blob/main/docs/en/getting-started/docker/docker-compose.yml

None of the nodes are available over anything but HTTPS, this is an attempt to reach them over HTTP:

# curl http://elastic01.test.example.internal:9200/_nodes/http?pretty=true -k -u elastic -vvv
Enter host password for user 'elastic':
*   Trying 10.110.40.6...
* TCP_NODELAY set
* Connected to elastic01.test.example.internal (10.110.40.6) port 9200 (#0)
* Server auth using Basic with user 'elastic'
> GET /_nodes/http?pretty=true HTTP/1.1
> Host: elastic01.test.example.internal:9200
> Authorization: Basic [masked] 
> User-Agent: curl/7.58.0
> Accept: */*
>
* Empty reply from server
* Connection #0 to host elastic01.test.example.internal left intact
curl: (52) Empty reply from server
archoversight commented 2 years ago

After spending some time spelunking through the official elasticsearch-py and elasticsearch-transport-py packages, which are the official Python packages from Elastic, it takes the initial node provided by the user and sticks it into a NodeConfig:

https://github.com/elastic/elastic-transport-python/blob/42320f5b5b75391b7ae7624ab7b9c058d7a8f173/elastic_transport/_models.py#L216

Then when it has sniffed the node, it calls:

https://github.com/elastic/elasticsearch-py/blob/7478eecc46fc085dc27d40c589127dda80d727fe/elasticsearch/_sync/client/_base.py#L199-L201

Which has:

meta.node.replace(host=host, port=port)

Which just replaces the host and port values in the original NodeConfig it used to do the sniffing (meta.node is the NodeConfig for the node that was used to perform the request to the API).

https://github.com/elastic/elastic-transport-python/blob/42320f5b5b75391b7ae7624ab7b9c058d7a8f173/elastic_transport/_models.py#L292

This would mean that node.scheme would be unchanged, and continue to be https if the initial NodeConfig was also https.

Unfortunately I can't find any official documentation for how to implement sniffing, but it seems to me that scheme in this case is expected to be the same as the original URL that was provided to kick off the sniffing in the first place.

The initial URL string provided by the user is parsed here:

https://github.com/elastic/elastic-transport-python/blob/42320f5b5b75391b7ae7624ab7b9c058d7a8f173/elastic_transport/client_utils.py#L184

olivere commented 2 years ago

A lot has happened in this area, it seems. I've not heard of any problems from other users in at least the last two major versions. But maybe people disable this by default, as is the default in the official drivers as well.

Anyway, the official Go driver uses this package as a transport, and uses this algorithm to do node discovery.

I will have to review the changes.

EDIT: typo.

archoversight commented 2 years ago

Sorry, I reviewed the Python packages as I was more familiar with Python.

But the go driver is a lot less complicated!

https://github.com/elastic/elastic-transport-go/blob/main/elastictransport/discovery.go#L125

sets the scheme to whatever the scheme is for the first URL provided to that function, which would be HTTPS in our case.

olivere commented 2 years ago

I will see if I can pick that up for the next release. Thanks for picking this up.