Opensearch elasticsearch cluster health couldn't able to get after node restarted

ganilmca commented 3 years ago

Hi Team,

We have done the opendsearch installation in 3 nodes. At the first time we could able to get cluster health details in all 3 nodes like below.

[elastic@es2 opensearch]$ curl -k -u admin:admin -XGET https://es2:9200/_cluster/health?pretty { "cluster_name" : "opensearch-elasticsearch", "status" : "green", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "discovered_master" : true, "active_primary_shards" : 1, "active_shards" : 3, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 } [elastic@es2 opensearch]$ date Fri Jul 16 17:11:09 IST 2021 [elastic@es opensearch]$

After restarted the es1 host then we got below error.

[elastic@es1 opensearch]$ curl -k -u admin:admin -XGET https://es1:9200/_cluster/health?pretty { "error" : { "root_cause" : [ { "type" : "security_exception", "reason" : "Unexpected exception cluster:monitor/health" } ], "type" : "security_exception", "reason" : "Unexpected exception cluster:monitor/health" }, "status" : 500 } [elastic@es1 opensearch]$ date Fri Jul 16 17:13:26 IST 2021 [elastic@es1 opensearch]$

We got the below error in logs.

[2021-07-16T17:13:17,436][ERROR][o.o.s.f.SecurityFilter ] [es1] Unexpected exception java.lang.ExceptionInInitializerError java.lang.ExceptionInInitializerError: null

If es2 host restarted then we have similar issue, same issue for es3 host too.

If all 3 hosts restarted then we could able to got cluster health only in 1 node(we con't say exactly purticular node). We couldn't able get cluster health in all 3 nodes until unless removed every thing in all 3 nodes then we have to freshly installation.

We have followed the below steps to install:

After untar the package "opensearch-1.0.0-linux-x64.tar.gz" then added the below lines to opensearch.yml

cluster.name: opensearch-elasticsearch node.name: ${HOSTNAME} path.data: /var/lib/scylla/elastic/opensearch/data path.logs: /var/lib/scylla/elastic/opensearch/logs network.host: x.x.x.x http.port: 9200 transport.tcp.port: 9300 discovery.seed_hosts: ["x.x.x.x:9300","x.x.x.x:9300","x.x.x.x:9300"] cluster.initial_master_nodes: ["x.x.x.x","x.x.x.x","x.x.x.x"]

Same config for other 2 nodes too except network.host

Started the scipt "opensearch-tar-install.sh"

Please help to come out from this issue.

Thank you.

frotsch commented 3 years ago

This issue should IMHO be moved to the "security" repo because its originating from the security plugin.

frotsch commented 3 years ago

I can not reproduce this issue.

What I did: 1) Create a docker compose file with three opensearch nodes (see below) 2) docker compose up 3) Wait until cluster is ready and curl https://localhost:9200/_cluster/health?pretty -k -u admin:admin reports three nodes 4) Klll one node (by killing the container) 5) curl https://localhost:9200/_cluster/health?pretty -k -u admin:admin reports two nodes (which is correct) 6) Start the killed container again, wait a few secs 7) curl https://localhost:9200/_cluster/health?pretty -k -u admin:admin reports three nodes (which is correct) 8) No exceptions in the logs of any node

docker-compose.yml

services:
  opensearch-node1:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node1
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node1
      - discovery.seed_hosts=opensearch-node1,opensearch-node2,opensearch-node3
      - cluster.initial_master_nodes=opensearch-node1,opensearch-node2,opensearch-node3
      - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
        hard: 65536
    volumes:
      - opensearch-data1:/usr/share/opensearch/data
    ports:
      - 9200:9200
    networks:
      - opensearch-net
  opensearch-node2:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node2
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node2
      - discovery.seed_hosts=opensearch-node1,opensearch-node2,opensearch-node3
      - cluster.initial_master_nodes=opensearch-node1,opensearch-node2,opensearch-node3
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-data2:/usr/share/opensearch/data
    networks:
      - opensearch-net
  opensearch-node3:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node3
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node3
      - discovery.seed_hosts=opensearch-node1,opensearch-node2,opensearch-node3
      - cluster.initial_master_nodes=opensearch-node1,opensearch-node2,opensearch-node3
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-data3:/usr/share/opensearch/data
    networks:
      - opensearch-net

volumes:
  opensearch-data1:
  opensearch-data2:
  opensearch-data3:

networks:
  opensearch-net:

ganilmca commented 3 years ago

Hi Team,

We have deployed the cluster in VM's(3 node cluster) not in docker. So please help us how to resolve this issue in VM's based cluster.

Please let us know if any info required from our end.

Thank you.

xuezhou25 commented 3 years ago

I would like to take up this issue.

xuezhou25 commented 3 years ago

Hi @ganilmca I tried to reproduce this issue by deploying the cluster with 3 nodes and did not get the error you mention. Could you clarify more steps or details about how to reproduce it?

ganilmca commented 3 years ago

Hi @xuezhou25

We have tried to redeploy the OpenSearch Elasticsearch cluster in another vm's , but we got the same issues, after restart the host we couldn't able to get cluster health in same host. We were getting the below same error.

{ "error" : { "root_cause" : [ { "type" : "security_exception", "reason" : "Unexpected exception cluster:monitor/health" } ], "type" : "security_exception", "reason" : "Unexpected exception cluster:monitor/health" }, "status" : 500 }

Can you please confirm one thing, was you deploy the cluster in VM's or with Docker image.

Thanks you.

xuezhou25 commented 3 years ago

Can you please confirm one thing, was you deploy the cluster in VM's or with Docker image.

Thanks you.

Sure I deployed the cluster on a VM(ubuntu). Did tarball installation and modified opensearch.yml. Do you mean deploy 3 nodes on 3 VMs(with same port and different IP address)?

ganilmca commented 3 years ago

@xuezhou25

Thanks for your confirmation, but we couldn't able to get cluster health in all 3 hosts, we can able to get health in only one host. We have redeployed in another ip's , port=9200 even though we got same issues like "security exception". We have edit the opensearch.yml file like below.

cluster.name: opensearch-elasticsearch node.name: ${HOSTNAME} path.data: /var/lib/scylla/elastic/opensearch/data path.logs: /var/lib/scylla/elastic/opensearch/logs network.host: x.x.x.x http.port: 9200 transport.tcp.port: 9300 discovery.seed_hosts: ["x.x.x.x:9300","x.x.x.x:9300","x.x.x.x:9300"] cluster.initial_master_nodes: ["x.x.x.x","x.x.x.x","x.x.x.x"]

Please have a look and share your opensearch.yml file, we will try with your yml file.

Please help us to get success of this.

Thank you,

xuezhou25 commented 3 years ago

My opensearch.yml file:

Node: 1

node.name: node-1
network.host: 192.168.0.3
discovery.seed_hosts: ["192.168.0.3", "192.168.0.10", "192.168.0.11"]
cluster.initial_master_nodes: ["192.168.0.3", "192.168.0.10", "192.168.0.11"]

Node: 2

node.name: node-2
network.host: 192.168.0.10
discovery.seed_hosts: ["192.168.0.3", "192.168.0.10", "192.168.0.11"]
cluster.initial_master_nodes: ["192.168.0.3", "192.168.0.10", "192.168.0.11"]

Node: 3

node.name: node-3
network.host: 192.168.0.11
discovery.seed_hosts: ["192.168.0.3", "192.168.0.10", "192.168.0.11"]
cluster.initial_master_nodes: ["192.168.0.3", "192.168.0.10", "192.168.0.11"]

Others are set as default value.

opensearch-project / OpenSearch

Opensearch elasticsearch cluster health couldn't able to get after node restarted #976