jprante / elasticsearch-jdbc

JDBC importer for Elasticsearch
Apache License 2.0
2.84k stars 709 forks source link

Feeder is supported with found.no ? #536

Open antoineberthelin opened 9 years ago

antoineberthelin commented 9 years ago

Hello Jörg,

I try to move my jdbc river on feeder and it's work with elasticsearch on my machin, no problem. I have some problems to communicate with found.no.

Found.no has a specific connector : https://www.found.no/documentation/tutorials/using-java-transport/ Do you know if there is a solution to use feeder with found.no (I can patch you project to modify transport module, it will work isn't it ?)

Thanks for your help.

Antoine

jprante commented 9 years ago

The found.no transport module is open source https://github.com/foundit/elasticsearch-transport-module so I think I can implement a found.no JDBC feeder.

antoineberthelin commented 9 years ago

Hello Jörg,

Yes it will be useful, This is found.no answer : " The required changes to use found.no plugin should be minimal as it does not need any references to the code in our plugin. All that is required is that the plugin jar is on the classpath and that you get to set the required elasticsearch settings described in the documentation of the plugin. If the feeder currently does not let you set the settings you want, then I'm sure jprante will accept a pull request for that. "

So Jörg, could you update JDBC feedler in order to add this new setting parameter ?

Thanks.

Antoine

antoineberthelin commented 9 years ago

Hello @jprante,

You have push some modification to implement found communication in master. Could you tell me if I can use it with an exemple if it's possible. Many thanks for your contribution, it's usefull !

Antoine

jprante commented 9 years ago

Here is a short example for feeder mode

{
    "elasticsearch" : {
         "cluster" : "elasticsearch",
         "host" : "host",
         "port" : 9300
    },
    "transport" : {
        "type" : "org.elasticsearch.transport.netty.FoundNettyTransport"
        "found" : {
            "api-key": "foobar"
        }
    },
    "type" : "jdbc",
    "jdbc" : {
        "url" : "jdbc:mysql://localhost:3306/test",
        "user" : "",
        "password" : "",
        "sql" :  "select *, page_id as _id from page",
        "treat_binary_as_string" : true,
        "index" : "metawiki"
      }
}

I hope it works. I can not test it.

antoineberthelin commented 9 years ago

Hello @jprante

Test and deliver in production since 4 days. It works :)

Thanks and good job.

Antoine

jjoewy commented 9 years ago

Hi @jprante and @antoineberthelin. Nice to meet you guys. If you're using the default cluster name the block:

"elasticsearch" : { "cluster" : "elasticsearch", "host" : "host", "port" : 9300 }

is not needed. If you have custom cluster and host names, this element should be inside "jdbc".

Regards, Alexander

mklaber commented 9 years ago

Did anyone actually get this to work? I've tried every combination of host, port, and cluster setting I can imagine but keep getting a org.elasticsearch.client.transport.NoNodeAvailableException exception. I'm using elasticsearch-jdbc-1.6.0.0 because that's the version that matches up to my found.no cluster.

What follows is my configuration using anonymized cluster IDs, users, passwords, IPs, etc. that are intended to at least look similar to what the actual values are. At the bottom is the log4j output after running ./run_reader.sh.

Help?

run_reader.sh

$JDBC_IMPORT_HOME is the path to elasticsearch-jdbc-1.6.0.0

#!/usr/bin/env bash

if [ -z "$JDBC_IMPORT_HOME" ]; then
  echo "ERROR: JDBC_IMPORT_HOME environment variable must be set";
  exit 1
fi

JDBC_BIN="$JDBC_IMPORT_HOME/bin"
JDBC_LIB="$JDBC_IMPORT_HOME/lib"

java \
  -cp "${JDBC_LIB}/*" \
  -Dlog4j.configurationFile="$JDBC_BIN/log4j2.xml" \
  org.xbib.tools.Runner \
  org.xbib.tools.JDBCImporter \
  es-jdbc.json

log4j2.xml settings

I just used the default log4j2.xml file

<?xml version="1.0" encoding="UTF-8"?>
<configuration status="OFF">
    <appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="[%d{ABSOLUTE}][%-5p][%-25c][%t] %m%n"/>
        </Console>
        <File name="File" fileName="logs/jdbc.log" immediateFlush="true"  append="true">
            <PatternLayout pattern="[%d{ABSOLUTE}][%-5p][%-25c][%t] %m%n"/>
        </File>
    </appenders>
    <Loggers>
        <Root level="info">
            <AppenderRef ref="File" />
        </Root>
        <!-- set this level to trace to debug SQL value mapping -->
        <Logger name="importer.jdbc.source.standard" level="info">
            <appender-ref ref="Console"/>
        </Logger>
        <Logger name="metrics.source.plain" level="info">
            <appender-ref ref="Console"/>
        </Logger>
        <Logger name="metrics.sink.plain" level="info">
            <appender-ref ref="Console"/>
        </Logger>
        <Logger name="metrics.source.json" level="info">
            <appender-ref ref="Console"/>
        </Logger>
        <Logger name="metrics.sink.json" level="info">
            <appender-ref ref="Console"/>
        </Logger>
    </Loggers>
</configuration>

es-jdbc.json

{
  "type": "jdbc",
  "jdbc": {
    "elasticsearch": {
      "host": "abc1234lph4num3r1cstring.us-east-1.aws.found.io",
      "cluster": "abc1234lph4num3r1cstring",
      "port": 9343
    },
    "transport": {
      "type": "org.elasticsearch.transport.netty.FoundNettyTransport",
      "found": {
        "api-key": "AbcdApiKey1234"
      }
    },
    "url": "jdbc:postgresql://ec1-1-1-1-1.compute-1.amazonaws.com:5442/myDatabase?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory",
    "user": "myUser",
    "password": "myDbPassword",
    "sql": "select * from public.my_table",
    "index": "my_index",
    "type": "my_type",
    "ignore_null_values": true
  }
}

Notes:

found.no ACL

default: deny

api_keys:
  - AbcdApiKey1234

auth:
  users:
    someuser: somepassword

rules:
  # Allow leader to do anything (for now) if authed
  - paths: ['.*']
    conditions:
      - basic_auth:
          users:
             - someuser
      - ssl:
          require: true
    action: allow
  # Allow the office access to anything without auth
  - paths: ['.*']
    conditions:
      - client_ip:
        ips:
          - 8.8.8.8 # actually my office's IP
      - ssl:
        require: false
    action: allow

log4j Output (jdbc.log)

[08:21:09,571][INFO ][importer.jdbc            ][main] index name = my_index, concrete index name = my_index
[08:21:09,587][INFO ][importer.jdbc            ][pool-2-thread-1] strategy standard: settings = {password=myDbPassword, user=myUser, elasticsearch.host=abc1234lph4num3r1cstring.us-east-1.aws.found.io, index=my_index, elasticsearch.port=9343, transport.type=org.elasticsearch.transport.netty.FoundNettyTransport, sql=select * from public.my_table, url=jdbc:postgresql://ec1-1-1-1-1.compute-1.amazonaws.com:5442/myDatabase?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory, ignore_null_values=true, type=lead, elasticsearch.cluster=abc1234lph4num3r1cstring, transport.found.api-key=AbcdApiKey1234}, context = org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext@353ec3bc
[08:21:09,589][INFO ][importer.jdbc.context.standard][pool-2-thread-1] found sink class org.xbib.elasticsearch.jdbc.strategy.standard.StandardSink@2e83330
[08:21:09,593][INFO ][importer.jdbc.context.standard][pool-2-thread-1] found source class org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource@41effb31
[08:21:09,622][INFO ][BaseTransportClient      ][pool-2-thread-1] creating transport client, java version 1.8.0_45, effective settings {cluster.name=cab074, host.0=abc1234lph4num3r1cstring.us-east-1.aws.found.io, port=9343, sniff=false, autodiscover=false, name=importer, client.transport.ignore_cluster_name=false, client.transport.ping_timeout=5s, client.transport.nodes_sampler_interval=5s, transport.type=org.elasticsearch.transport.netty.FoundNettyTransport, transport.found.api-key=AbcdApiKey1234}
[08:21:09,663][INFO ][org.elasticsearch.plugins][pool-2-thread-1] [importer] loaded [support-1.6.0.0-d7bb0e9], sites []
[08:21:10,172][INFO ][BaseTransportClient      ][pool-2-thread-1] trying to connect to [inet[abc1234lph4num3r1cstring.us-east-1.aws.found.io/X.1.X.3:9343]]
[08:21:10,283][INFO ][org.elasticsearch.client.transport][pool-2-thread-1] [importer] failed to get node info for [#transport#-1][mklaber-ee-mbp.local][inet[abc1234lph4num3r1cstring.us-east-1.aws.found.io/X.1.X.3:9343]], disconnecting...
org.elasticsearch.transport.NodeDisconnectedException: [][inet[abc1234lph4num3r1cstring.us-east-1.aws.found.io/X.1.X.3:9343]][cluster:monitor/nodes/info] disconnected
[08:21:10,286][ERROR][importer                 ][pool-2-thread-1] error while getting next input: no cluster nodes available, check settings {cluster.name=abc1234lph4num3r1cstring, host.0=abc1234lph4num3r1cstring.us-east-1.aws.found.io, port=9343, sniff=false, autodiscover=false, name=importer, client.transport.ignore_cluster_name=false, client.transport.ping_timeout=5s, client.transport.nodes_sampler_interval=5s, transport.type=org.elasticsearch.transport.netty.FoundNettyTransport, transport.found.api-key=AbcdApiKey1234}
org.elasticsearch.client.transport.NoNodeAvailableException: no cluster nodes available, check settings {cluster.name=cab074, host.0=abc1234lph4num3r1cstring.us-east-1.aws.found.io, port=9343, sniff=false, autodiscover=false, name=importer, client.transport.ignore_cluster_name=false, client.transport.ping_timeout=5s, client.transport.nodes_sampler_interval=5s, transport.type=org.elasticsearch.transport.netty.FoundNettyTransport, transport.found.api-key=AbcdApiKey1234}
    at org.xbib.elasticsearch.support.client.BaseTransportClient.createClient(BaseTransportClient.java:53) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
    at org.xbib.elasticsearch.support.client.BaseIngestTransportClient.newClient(BaseIngestTransportClient.java:22) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
    at org.xbib.elasticsearch.support.client.transport.BulkTransportClient.newClient(BulkTransportClient.java:88) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext$1.create(StandardContext.java:440) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSink.beforeFetch(StandardSink.java:94) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext.beforeFetch(StandardContext.java:207) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext.execute(StandardContext.java:188) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
    at org.xbib.tools.JDBCImporter.process(JDBCImporter.java:117) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
    at org.xbib.tools.Importer.newRequest(Importer.java:241) [elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
    at org.xbib.tools.Importer.newRequest(Importer.java:57) [elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
    at org.xbib.pipeline.AbstractPipeline.call(AbstractPipeline.java:86) [elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
    at org.xbib.pipeline.AbstractPipeline.call(AbstractPipeline.java:17) [elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_45]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_45]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_45]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]
[08:21:10,290][WARN ][BulkTransportClient      ][Thread-1] no client

Help?

jprante commented 9 years ago

The code of FoundNettyTransport changed to use another domain name found.io instead of foundcluster.com. I will update the copy of the code in JDBC importer.

jprante commented 9 years ago

I have released JDBC importer 1.6.0.1 and 1.7.0.1 with an update, in the hope that FoundNettyTransport will work (can't test it).

mklaber commented 9 years ago

@jprante can you push another -dist package up to http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/1.6.0.1/ ? it's missing the -dist.zip file...

mklaber commented 9 years ago

I used the 1.6.0.0 distribution and replaced the uberjar with the uberjar for 1.6.0.1 to test things out (though it'd be helpful for automation purposes if a zip distribution was available for 1.6.0.1) and it seems to work.

For posterity (and anyone else that finds this thread), you need to use the long abc1234lph4num3r1cstring value for the cluster rather than the abc123 that they refer to as cluster ID.

Also, my ACL is messed up. After I loaded found.no's default ACL and added an api key, everything worked.

Thanks for the quick response @jprante

jprante commented 9 years ago

@mklaber Thanks for the testing, I hope the findings are useful for found users. I uploaded the binaries again, now with the dist zips included.

mklaber commented 9 years ago

:+1:

kishoreactivity commented 9 years ago

Hi mklaber/jprante,

I am new to elasticsearch. Jdbc importer plugin looks awesome.

It worked well on my local windows machine. But when I tried the same on AWS (linux), I am seeing following exception

org.elasticsearch.client.transport.NoNodeAvailableException: no cluster nodes available, check settings

While searching for forums, I saw this thread.

How to configure ACLs or "found.no" that was pasted by mklaber? Do we need to create a file with found.no and place that in \config?

Please help and thanks in advance.

mklaber commented 9 years ago

@kishoreactivity I just started with the default ACL they provided and then modified one setting at a time until it was reasonably secure and still working.

kishoreactivity commented 9 years ago

@mklaber Thanks for a prompt reply. Can you please point me to default ACL file in elastic search in elastic search installation directory? Under config, I have logging and elasticsearch.yaml files only.

mklaber commented 9 years ago

@kishoreactivity when I say default ACL I mean the default provided by found.no: https://www.elastic.co/guide/en/found/current/access-control.html

jprante commented 9 years ago

If something is broken, please let me know. The trick is I copied the settings that are relevant for found.no to the JDBC importer. So if the settings in Elastic/Found authentication API change, I have to follow these changes. Thanks to open source this is possible.