vesoft-inc / nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability
https://nebula-graph.io
Apache License 2.0
10.61k stars 1.18k forks source link

feat: ES Listener not supposed in elasticsearch v8+ #5910

Open douglasrfaisal-gl opened 1 month ago

douglasrfaisal-gl commented 1 month ago

Please check the FAQ documentation before raising an issue

Describe the bug (required)

Your Environments (required)

How To Reproduce(required)

Steps to reproduce the behavior:

  1. Deploy the attached docker-compose file, containing metad, storaged, listener, graphd, storage-activator, and elasticsearch (v8.14.1). Run using the command docker-compose up -d in the directory containing this file.
  2. Run nebula-console. Connect to NebulaGraph. Run the following queries (as demonstrated in Examples)
    
    // This example creates the graph space.
    nebula> CREATE SPACE IF NOT EXISTS basketballplayer (partition_num=3,replica_factor=1, vid_type=fixed_string(30));

// This example signs in the text service. nebula> SIGN IN TEXT SERVICE ("es":9200, HTTP);

// This example checks the text service status. nebula> SHOW TEXT SEARCH CLIENTS; +-----------------+------+------+ | Type | Host | Port | +-----------------+------+------+ | "ELASTICSEARCH" | "es" | 9200 | +-----------------+------+------+

// This example switches the graph space. nebula> USE basketballplayer;

// This example adds the listener to the NebulaGraph cluster. nebula> ADD LISTENER ELASTICSEARCH "listener0":9789;

// This example checks the listener status. When the status is Online, the listener is ready. nebula> SHOW LISTENER; +--------+-----------------+--------------------+-------------+ | PartId | Type | Host | Host Status | +--------+-----------------+--------------------+-------------+ | 1 | "ELASTICSEARCH" | ""listener0":9789" | "ONLINE" | | 2 | "ELASTICSEARCH" | ""listener0":9789" | "ONLINE" | | 3 | "ELASTICSEARCH" | ""listener0":9789" | "ONLINE" | +--------+-----------------+--------------------+-------------+

// This example creates the tag. nebula> CREATE TAG IF NOT EXISTS player(name string, city string);

// This example creates a single-attribute full-text index. nebula> CREATE FULLTEXT TAG INDEX fulltext_index_1 ON player(name) ANALYZER="standard";

// This example creates a multi-attribute full-text indexe. nebula> CREATE FULLTEXT TAG INDEX fulltext_index_2 ON player(name,city) ANALYZER="standard";

// This example rebuilds the full-text index. nebula> REBUILD FULLTEXT INDEX;

// This example shows the full-text index. nebula> SHOW FULLTEXT INDEXES; +--------------------+-------------+-------------+--------------+------------+ | Name | Schema Type | Schema Name | Fields | Analyzer | +--------------------+-------------+-------------+--------------+------------+ | "fulltext_index_1" | "Tag" | "player" | "name" | "standard" | | "fulltext_index_2" | "Tag" | "player" | "name, city" | "standard" | +--------------------+-------------+-------------+--------------+------------+

// This example inserts the test data. nebula> INSERT VERTEX player(name, city) VALUES \ "Russell Westbrook": ("Russell Westbrook", "Los Angeles"), \ "Chris Paul": ("Chris Paul", "Houston"),\ "Boris Diaw": ("Boris Diaw", "Houston"),\ "David West": ("David West", "Philadelphia"),\ "Danny Green": ("Danny Green", "Philadelphia"),\ "Tim Duncan": ("Tim Duncan", "New York"),\ "James Harden": ("James Harden", "New York"),\ "Tony Parker": ("Tony Parker", "Chicago"),\ "Aron Baynes": ("Aron Baynes", "Chicago"),\ "Ben Simmons": ("Ben Simmons", "Phoenix"),\ "Blake Griffin": ("Blake Griffin", "Phoenix");

// These examples run test queries. nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"Chris") YIELD id(vertex);

3. Executing the last query returns empty data

+------------+ | id(VERTEX) | +------------+ +------------+


**Expected behavior**

<!-- A clear and concise description of what you expected to happen. -->
It should yield a player data.

+--------------+ | id(VERTEX) | +--------------+ | "Chris Paul" | +--------------+


**Additional context**

<!-- Provide logs and configs, or any other context to trace the problem. -->

Found the following error in the listener container:

E20240718 08:40:14.494640 101 ESListener.cpp:65] {"reason":"Action/metadata line [1] contains an unknown parameter [_type]","type":"illegal_argument_exception","root_cause":[{"reason":"Action/metadata line [1] contains an unknown parameter [_type]","type":"illegal_argument_exception"}]}


It may be related to the removal of `_type` during Elasticsearch data ingestion. [Link to Elasticsearch's documentation](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/removal-of-types.html#_why_are_mapping_types_being_removed).

**Attached File**

docker-compose.yml
```yaml
version: '3.4'
services:
  metad0:
    image: docker.io/vesoft/nebula-metad:v3.8.0
    environment:
      USER: root
    command:
      - --meta_server_addrs=metad0:9559
      - --local_ip=metad0
      - --ws_ip=metad0
      - --port=9559
      - --ws_http_port=19559
      - --data_path=/data/meta
      # - --log_dir=/logs
      # log to stderr not file
      - --logtostderr=true
      - --redirect_stdout=false
      # log to stderr not file
      - --v=0
      - --minloglevel=0
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://metad0:19559/status"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 20s
    ports:
      - 9559:9559
      - 19559:19559
      - 19560
    volumes:
      - ./data/meta0:/data/meta
      # - ./logs/meta0:/logs
    networks:
      - nebula-net
    restart: on-failure
    cap_add:
      - SYS_PTRACE

  storaged0:
    image: docker.io/vesoft/nebula-storaged:v3.8.0
    environment:
      USER: root
    command:
      - --meta_server_addrs=metad0:9559
      - --local_ip=storaged0
      - --ws_ip=storaged0
      - --port=9779
      - --ws_http_port=19779
      - --data_path=/data/storage
      # - --log_dir=/logs
      # log to stderr not file
      - --logtostderr=true
      - --redirect_stdout=false
      # log to stderr not file
      - --v=0
      - --minloglevel=0
    depends_on:
      - metad0
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://storaged0:19779/status"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 20s
    ports:
      - 9779:9779
      - 19779:19779
      - 19780
    volumes:
      - ./data/storage0:/data/storage
      # - ./logs/storage0:/logs
    networks:
      - nebula-net
    restart: on-failure
    cap_add:
      - SYS_PTRACE

  listener0:
    image: docker.io/vesoft/nebula-storaged:v3.8.0
    environment:
      USER: root
    command:
      - --meta_server_addrs=metad0:9559
      - --local_ip=listener0
      - --ws_ip=listener0
      - --port=9789
      - --ws_http_port=19789
      - --heartbeat_interval_secs=10
      - --listener_path=/data/listener
      # - --log_dir=/logs
      # log to stderr not file
      - --logtostderr=true
      - --redirect_stdout=false
      # log to stderr not file
      - --v=0
      - --minloglevel=0
    depends_on:
      - metad0
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://listener0:19789/status"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 20s
    ports:
      - 9789:9789
      - 19789:19789
      - 19790
    volumes:
      - ./data/listener0:/data/listener
      # - ./logs/storage0:/logs
    networks:
      - nebula-net
    restart: on-failure
    cap_add:
      - SYS_PTRACE

  graphd:
    image: docker.io/vesoft/nebula-graphd:v3.8.0
    environment:
      USER: root
    command:
      - --meta_server_addrs=metad0:9559
      - --port=9669
      - --local_ip=graphd
      - --ws_ip=graphd
      - --ws_http_port=19669
      # - --log_dir=/logs
      # log to stderr not file
      - --logtostderr=true
      - --redirect_stdout=false
      # log to stderr not file
      - --v=0
      - --minloglevel=0
    depends_on:
      - storaged0
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://graphd:19669/status"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 20s
    ports:
      - 9669:9669
      - 19669:19669
      - 19670
    # volumes:
    #   - ./logs/graph:/logs
    networks:
      - nebula-net
    restart: on-failure
    cap_add:
      - SYS_PTRACE

  storage-activator:
    # This is just a script to activate storaged for the first time run by calling nebula-console
    # Refer to https://docs.nebula-graph.io/master/4.deployment-and-installation/manage-storage-host/#activate-storaged
    # If you like to call console via docker, run:

    # docker run --rm -ti --network host vesoft/nebula-console:nightly -addr 127.0.0.1 -port 9669 -u root -p nebula

    image: docker.io/vesoft/nebula-console:v3.8.0
    entrypoint: ""
    environment:
      ACTIVATOR_RETRY: ${ACTIVATOR_RETRY:-30}
    command: 
      - sh
      - -c
      - |
        for i in `seq 1 $$ACTIVATOR_RETRY`; do
          nebula-console -addr graphd -port 9669 -u root -p nebula -e 'ADD HOSTS "storaged0":9779' 1>/dev/null 2>/dev/null;
          if [[ $$? == 0 ]]; then
            echo "✔️ Storage activated successfully.";
            break;
          else
            output=$$(nebula-console -addr graphd -port 9669 -u root -p nebula -e 'ADD HOSTS "storaged0":9779' 2>&1);
            if echo "$$output" | grep -q "Existed"; then
              echo "✔️ Storage already activated, Exiting...";
              break;
            fi
          fi;
          if [[ $$i -lt $$ACTIVATOR_RETRY ]]; then
            echo "⏳ Attempting to activate storaged, attempt $$i/$$ACTIVATOR_RETRY... It's normal to take some attempts before storaged is ready. Please wait.";
          else
            echo "❌ Failed to activate storaged after $$ACTIVATOR_RETRY attempts. Please check MetaD, StorageD logs. Or restart the storage-activator service to continue retry.";
            echo "ℹ️ Error during storage activation:"
            echo "=============================================================="
            echo "$$output"
            echo "=============================================================="
            break;
          fi;
          sleep 5;
        done && tail -f /dev/null;

    depends_on:
      - graphd
    networks:
      - nebula-net

  es:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.14.1
    environment:
      - node.name=es01
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - discovery.type=single-node
      - xpack.security.enabled=false
      - network.host=0.0.0.0
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - "9200:9200"
      - "9300:9300"
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    networks:
      - nebula-net

networks:
  nebula-net:
    driver: bridge

volumes:
  esdata1:
    driver: local
wey-gu commented 1 month ago

Dear @douglasrfaisal-gl

Indeed this feature was implemented under the assumption where type was not removed.

Thus for now only 7.x of elasticsearch is supported now, and I just found this is not documented, will fix the docs on https://docs.nebula-graph.io/3.8.0/4.deployment-and-installation/6.deploy-text-based-index/2.deploy-es/ .

Maybe we could in the future refactor the mapping by removing _type to enable newer elasticsearch.

Thanks!

douglasrfaisal-gl commented 1 month ago

Hi, @wey-gu

Thanks! I can confirm that using Elasticsearch 7.x resolved my issue.