prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.92k stars 5.33k forks source link

How to connect with elastic search using presto when elasticsearch run on remote server #12668

Open shubham-agarwal-ngi opened 5 years ago

shubham-agarwal-ngi commented 5 years ago

I have created the elastic search mapping json file with the following details

{
    "tableName": "test",
    "schemaName": "es",
    "host": "XX.XXX.XX.XXX",
    "port": 1234,
    "clusterName": "elasticsearch",
    "index": "test",
    "type": "user",
    "columns": [{
            "name": "GENDER",
            "type": "varchar",
            "jsonPath": "GENDER",
            "jsonType": "varchar",
            "ordinaryPosition": "0"
        },
        {
            "name": "NAME",
            "type": "varchar",
            "jsonPath": "NAME",
            "jsonType": "varchar",
            "ordinaryPosition": "1"
        },
        {
            "name": "price",
            "type": "varchar",
            "jsonPath": "price",
            "jsonType": "varchar",
            "ordinaryPosition": "2"
        }
    ]
}

I have updated some values in elasticsearch.yml file

cluster.name: elasticsearch
node.name: elasticsearch-node-1
transport.tcp.port: 1234
network.host: [localhost,127.0.0.1]
http.port: 9200

My elastic search run successfully on http://XX.XXX.XX.XXX:9200 but when we hit to elasticsearch using presto to fetch the data from particular index, it has given the following error:

java.lang.RuntimeException: NoNodeAvailableException[None of the configured nodes are available

please advice.

rschlussel commented 5 years ago

I don't have much experience with ElasticSearch, but I see you set the port in the mapping file to 1234, which is the transport port for internal communication. I think it should be set to the http port of 9200.

shubham-agarwal-ngi commented 5 years ago

I have resolved this query.

we should mention the another parameter also in elasticsearch.yml file .i.e network.publish_host:

Thanks

yinsk commented 5 years ago

@shubham-agarwal-ngi Hi, so did you solve this problem? How? I just encountered the same problem.

wenleix commented 5 years ago

cc @zhenxiao

jwfcps commented 5 years ago

You must use the transport port (usually 9300), not the http port (9200).

zhenxiao commented 5 years ago

@shubham-agarwal-ngi @yinsk the following message:

java.lang.RuntimeException: NoNodeAvailableException[None of the configured nodes are available

means could not connect to the configured Elasticsearch, it could be port number incorrect, host not available, cluster name incorrect.

As @jwfcps mentioned, we usually use 9300 as Elasticsearch transport port. Could you please take a try?

xeob commented 5 years ago

I have a similar problem. Below are the steps to reproduce (Mojave, OS X), presto@2.220 from brew, Java 8.

  1. Pull an Elasticsearch docker image nad run it docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.6.2

  2. Create a users index

    PUT users
  3. Set the users mapping

    PUT users/_mapping/_doc
    {
    "properties": {
    "key": {
      "type": "keyword"
    },
    "username": {
      "type": "keyword"
    },
    "email": {
      "type": "keyword"
    }
    }
    }
  4. Create a dummy doc

    POST users/_doc/test
    {
    "key": "test",
    "username": "test_username",
    "email": "test@test.com"
    }
  5. Create $PRESTO_HOME/etc/catalog/elasticsearch.properties

    connector.name=elasticsearch
    elasticsearch.scroll-size=1000
    elasticsearch.request-timeout=2s
    elasticsearch.max-request-retries=5
    elasticsearch.max-request-retry-time=10s
  6. Create $PRESTO_HOME/etc/elasticsearch/my_schema.users.json

    {
    "tableName": "users",
    "schemaName": "my_schema",
    "host": "localhost",
    "port": "9300",
    "clusterName": "docker-cluster",
    "index": "users",
    "type": "doc",
    "columns": [
        {
            "name": "key",
            "type": "varchar",
            "jsonPath": "key",
            "jsonType": "varchar",
            "ordinalPosition": "0"
        }
    ]
    }
  7. Run the server with presto-server run

  8. Connect presto with

    $ presto --catalog elasticsearch --schema my_schema;
  9. Run the query

    SELECT * FROM users LIMIT 1;

Query runs for ~30s.

presto:my_schema> SELECT * FROM users LIMIT 1;

Query 20190524_144525_00004_z4rru, RUNNING, 1 node, 22 splits
0:12 [    0 rows,     0B] [    0 rows/s,     0B/s] [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 0%

     STAGES   ROWS  ROWS/s  BYTES  BYTES/s  QUEUED    RUN   DONE
0.........R      0       0     0B       0B       0     17      0
  1.......R      0       0     0B       0B       0      5      0

Then it errors out with the following message

Query 20190524_144525_00004_z4rru, FAILED, 1 node
Splits: 22 total, 0 done (0.00%)
0:34 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20190524_144525_00004_z4rru failed: NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{hFFnQ5c-QAG9QUMVu2TI8w}{172.17.0.2}{172.17.0.2:9300}]]

The query does not result in an error only when there is no users index.

presto:my_schema> SELECT * FROM users LIMIT 1;
 key | username | email
-----+----------+-------
(0 rows)

Query 20190524_144354_00003_z4rru, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

Any help would be appreciated.

minhnguyenvato commented 5 years ago

Got the same error

minhnguyenvan95 commented 5 years ago

Just resolved it , You should use transport.publish_host=your-ip-address and verify that when you using curl localhost:9200/_nodes the transport_address should be your ip and clusterName should match your config or default is elasticsearch

xeob commented 5 years ago

Thanks, makes sense! I need to see if it's possible to replace the TransportClient with a HighLevelRestClient, as I don't have clusters with the transport published host in production. We're migrating to Elasticsearch@7.x soon, and it looks like TransportClient is deprecated anyway.

minhnguyenvan95 commented 5 years ago

You can use minukube then using helm to install your es cluster in local

Vào Th 2, 27 thg 5, 2019 lúc 17:56 Kamil Nikel notifications@github.com đã viết:

Thanks, makes sense! I need to see if it's possible to replace the TransportClient with a HighLevelRestClient, as I don't have clusters with the transport published host in production. We're migrating to Elasticsearch@7.x soon, and it looks like TransportClient is deprecated https://www.elastic.co/guide/en/elasticsearch/client/java-api/master/transport-client.html anyway.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prestodb/presto/issues/12668?email_source=notifications&email_token=ADNHIIWQ7AYROUHERBYYC7LPXO45HA5CNFSM4HGB2PN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWJPY3Y#issuecomment-496172143, or mute the thread https://github.com/notifications/unsubscribe-auth/ADNHIIREJTL2DOEUQFKHTXDPXO45HANCNFSM4HGB2PNQ .

--

Best Regards, Minh Nguyen +84 993-178-816 minh.nguyenvan95@gmail.com

minhnguyenvato commented 5 years ago

@knikel high-level elasticsearch could be a good option . I run an elasticsearch in k8s then using kubectl port-forward but my presto elasticsearch connector can't connect to it through low-level rest . I think high-level elasticsearch could be a better choice.

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high.html https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-low.html

zhenxiao commented 5 years ago

@knikel @minhnguyenvato good point, java high level is a good option when I was implementing the Presto Elasticsearch Connector, I was working on Elasticsearch 6.0. Will file a following task to migrate to java high level elasticsearch

xeob commented 5 years ago

@zhenxiao: I'm willing to take a stab at migrating the TransportClient to HighLevelRestClient at some point in June. However, I'm not sure what should happen with the SearchGuard plugin. The plugin is meant to work with TransportClient. I did research the options a bit, and it feels like the SSL setup could look as suggested in this issue. I have zero experience with SG so that I might need some help there.

zhenxiao commented 5 years ago

sure, you are welcome to do it @knikel Two things for consideration,

  1. Compatibility. for Elasticsearch, 2.x is not compatible with 6.x, which is compatible with 5.x. It would be nice if HighLevelRestClient could be backward compatible with 6.x and 5.x.
  2. Performance. When designing and implementing the Elasticsearch connector, we try to maximize data parallelism, so leveraging TransportClient to scan data parallel from multiple search nodes via shard. Scan data from one single search node would have performance issues, as the network bandwidth between one single search node and worker is limited. just ping me for review when you get it ready, or need any help
xeob commented 5 years ago

@zhenxiao: I have a working solution on this branch (I'll create a proper PR next week, here's the draft: https://github.com/prestodb/presto/pull/12889). I'm worried it'll be very hard to keep the 5.x compatibility. I'm using Slice Scroll (introduced in 5.5) to distribute parts of the scroll over shards (split -> shard). In my opinion, this feature has been only in 5.5 and 5.6 version of the 5.x release so it's not worth the effort to keep backward compatibility with 5.x. The other problem is the lack of custom document types, in 6.x everything is a doc so there would be additional effort required to maintain how doc types are stored, etc. However, we would be able to ensure 6.x and 7.x compatibility. From my side what's left to do is to bring the SSL back somehow, clean up the code and possibly write some tests.

zhenxiao commented 5 years ago

sure. looking forward to ur PR. let's continue the discussion in the PR

Trofym commented 5 years ago

@zhenxiao Does Presto 318 work with Elasticsearch version 7.3.1? I can't connect and get this error

Caused by: com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of `io.prestosql.elasticsearch.ElasticsearchTableDescription`, problem: host is null or empty
 at [Source: (byte[])"{
    "tableName": "my_table",
    "schemaName": "default",
    "hostAddress": "127.0.0.1",
    "port": "9300",
    "clusterName": "my_cluster",
    "index": "my_index",
    "indexExactMatch": "false",
    "type": "doc",
    "columns": [
        {
            "name": "field",
            "type": "varchar",
            "jsonPath": "field",
            "jsonType": "varchar",
            "ordinalPosition": "0"
        }
    ]
}
findepi commented 5 years ago

Does Presto 318 work with Elasticsearch version 7.3.1?

@Trofym Presto 3xx releases are maintained at https://github.com/prestosql/presto. Also, there is the #troubleshooting channel on Presto Community Slack (https://trino.io/community.html)

zhenxiao commented 5 years ago

@Trofym seems the hostname could not be resolved. Is it 127.0.0.1, did you try localhost?

Trofym commented 5 years ago

@Trofym seems the hostname could not be resolved. Is it 127.0.0.1, did you try localhost?

Yes, I have tried localhost, but no difference

fabienbarbaud commented 4 years ago

@Trofym seems the hostname could not be resolved. Is it 127.0.0.1, did you try localhost?

Yes, I have tried localhost, but no difference

You should use "host" instead of "hostAddress".

iceted commented 4 years ago

If your using the latest version the host property isn't needed in the table json any more: https://prestosql.io/docs/current/connector/elasticsearch.html

funston commented 4 years ago

unclear how this works, i keep getting this version mismatch with ES 7.1 and 0.234 presto. which version of ES are you using?

"Received message from unsupported version: [6.0.0] minimal compatible version is: [6.8.0]"

minhnguyenvan95 commented 4 years ago

unclear how this works, i keep getting this version mismatch with ES 7.1 and 0.234 presto. which version of ES are you using?

"Received message from unsupported version: [6.0.0] minimal compatible version is: [6.8.0]"

try to upgrade your es driver in presto category config