opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.88k stars 1.84k forks source link

[BUG] Nodes don't connect to publish_host via FQDN but via IP (breaks TLS full verification) #8830

Open ruffy91 opened 1 year ago

ruffy91 commented 1 year ago

Describe the bug When connecting to other nodes in the cluster opensearch is using the locally resolved IP of nodes to connect to them instead of using DNS resolution. This prevents using full validation for TLS certificates on Ubuntu (as Ubuntu is always resolving the own FQDN to the loopback address 127.0.1.1). It could also affect other systems where hostname resolution is inconsistent between local and remote nodes.

To Reproduce Steps to reproduce the behavior:

  1. set node.publish_host to the FQDN (node-a.example.net) of node A
  2. start another node (node-b) in the cluster, it gets the hostname and IP of the remote node (node-a.example.net/127.0.1.1)
  3. node-b tries to connect to node-a via the IP (resolved on node-a) 127.0.1.1 instead of connecting via the published FQDN
  4. This obviously fails as 127.0.1.1 is node-b instead of node-a

Expected behavior I expect node-b to connect to node-a via the FQDN published by node-a, not it's IP address, as I am setting publish_host to an FQDN, not an IP

Plugins opensearch-security

Host/Environment (please complete the following information):

Additional context This was already known to elastic to be wrong since 2020 and they wanted to fix it but never followed through https://github.com/elastic/elasticsearch/issues/49795 This issue makes it really hard using hostname verification with TLS

dblock commented 2 months ago

@ruffy91 Do you know how to fix this or at least to implement a failing test for it? Appreciate any help.