Open antoineberthelin opened 9 years ago
The found.no transport module is open source https://github.com/foundit/elasticsearch-transport-module so I think I can implement a found.no JDBC feeder.
Hello Jörg,
Yes it will be useful, This is found.no answer : " The required changes to use found.no plugin should be minimal as it does not need any references to the code in our plugin. All that is required is that the plugin jar is on the classpath and that you get to set the required elasticsearch settings described in the documentation of the plugin. If the feeder currently does not let you set the settings you want, then I'm sure jprante will accept a pull request for that. "
So Jörg, could you update JDBC feedler in order to add this new setting parameter ?
Thanks.
Antoine
Hello @jprante,
You have push some modification to implement found communication in master. Could you tell me if I can use it with an exemple if it's possible. Many thanks for your contribution, it's usefull !
Antoine
Here is a short example for feeder mode
{
"elasticsearch" : {
"cluster" : "elasticsearch",
"host" : "host",
"port" : 9300
},
"transport" : {
"type" : "org.elasticsearch.transport.netty.FoundNettyTransport"
"found" : {
"api-key": "foobar"
}
},
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "",
"password" : "",
"sql" : "select *, page_id as _id from page",
"treat_binary_as_string" : true,
"index" : "metawiki"
}
}
I hope it works. I can not test it.
Hello @jprante
Test and deliver in production since 4 days. It works :)
Thanks and good job.
Antoine
Hi @jprante and @antoineberthelin. Nice to meet you guys. If you're using the default cluster name the block:
"elasticsearch" : { "cluster" : "elasticsearch", "host" : "host", "port" : 9300 }
is not needed. If you have custom cluster and host names, this element should be inside "jdbc".
Regards, Alexander
Did anyone actually get this to work? I've tried every combination of host
, port
, and cluster
setting I can imagine but keep getting a org.elasticsearch.client.transport.NoNodeAvailableException
exception. I'm using elasticsearch-jdbc-1.6.0.0
because that's the version that matches up to my found.no cluster.
What follows is my configuration using anonymized cluster IDs, users, passwords, IPs, etc. that are intended to at least look similar to what the actual values are. At the bottom is the log4j output after running ./run_reader.sh
.
Help?
run_reader.sh
$JDBC_IMPORT_HOME
is the path to elasticsearch-jdbc-1.6.0.0
#!/usr/bin/env bash
if [ -z "$JDBC_IMPORT_HOME" ]; then
echo "ERROR: JDBC_IMPORT_HOME environment variable must be set";
exit 1
fi
JDBC_BIN="$JDBC_IMPORT_HOME/bin"
JDBC_LIB="$JDBC_IMPORT_HOME/lib"
java \
-cp "${JDBC_LIB}/*" \
-Dlog4j.configurationFile="$JDBC_BIN/log4j2.xml" \
org.xbib.tools.Runner \
org.xbib.tools.JDBCImporter \
es-jdbc.json
log4j2.xml
settingsI just used the default log4j2.xml file
<?xml version="1.0" encoding="UTF-8"?>
<configuration status="OFF">
<appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="[%d{ABSOLUTE}][%-5p][%-25c][%t] %m%n"/>
</Console>
<File name="File" fileName="logs/jdbc.log" immediateFlush="true" append="true">
<PatternLayout pattern="[%d{ABSOLUTE}][%-5p][%-25c][%t] %m%n"/>
</File>
</appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="File" />
</Root>
<!-- set this level to trace to debug SQL value mapping -->
<Logger name="importer.jdbc.source.standard" level="info">
<appender-ref ref="Console"/>
</Logger>
<Logger name="metrics.source.plain" level="info">
<appender-ref ref="Console"/>
</Logger>
<Logger name="metrics.sink.plain" level="info">
<appender-ref ref="Console"/>
</Logger>
<Logger name="metrics.source.json" level="info">
<appender-ref ref="Console"/>
</Logger>
<Logger name="metrics.sink.json" level="info">
<appender-ref ref="Console"/>
</Logger>
</Loggers>
</configuration>
es-jdbc.json
{
"type": "jdbc",
"jdbc": {
"elasticsearch": {
"host": "abc1234lph4num3r1cstring.us-east-1.aws.found.io",
"cluster": "abc1234lph4num3r1cstring",
"port": 9343
},
"transport": {
"type": "org.elasticsearch.transport.netty.FoundNettyTransport",
"found": {
"api-key": "AbcdApiKey1234"
}
},
"url": "jdbc:postgresql://ec1-1-1-1-1.compute-1.amazonaws.com:5442/myDatabase?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory",
"user": "myUser",
"password": "myDbPassword",
"sql": "select * from public.my_table",
"index": "my_index",
"type": "my_type",
"ignore_null_values": true
}
}
Notes:
cluster
I've tried both the long cluster_name
as shown above and the first 6 characters (abc123
) which found.no's interface in one spot refers to as in Please provide the cluster ID "abc123" when contacting support about your cluster.elasticsearch
object outside of the jdbc
object like this example demonstrated but then it wouldn't recognize the elasticsearch
object at all (even when using it for a non-transport based connection to a local ES cluster)default: deny
api_keys:
- AbcdApiKey1234
auth:
users:
someuser: somepassword
rules:
# Allow leader to do anything (for now) if authed
- paths: ['.*']
conditions:
- basic_auth:
users:
- someuser
- ssl:
require: true
action: allow
# Allow the office access to anything without auth
- paths: ['.*']
conditions:
- client_ip:
ips:
- 8.8.8.8 # actually my office's IP
- ssl:
require: false
action: allow
jdbc.log
)[08:21:09,571][INFO ][importer.jdbc ][main] index name = my_index, concrete index name = my_index
[08:21:09,587][INFO ][importer.jdbc ][pool-2-thread-1] strategy standard: settings = {password=myDbPassword, user=myUser, elasticsearch.host=abc1234lph4num3r1cstring.us-east-1.aws.found.io, index=my_index, elasticsearch.port=9343, transport.type=org.elasticsearch.transport.netty.FoundNettyTransport, sql=select * from public.my_table, url=jdbc:postgresql://ec1-1-1-1-1.compute-1.amazonaws.com:5442/myDatabase?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory, ignore_null_values=true, type=lead, elasticsearch.cluster=abc1234lph4num3r1cstring, transport.found.api-key=AbcdApiKey1234}, context = org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext@353ec3bc
[08:21:09,589][INFO ][importer.jdbc.context.standard][pool-2-thread-1] found sink class org.xbib.elasticsearch.jdbc.strategy.standard.StandardSink@2e83330
[08:21:09,593][INFO ][importer.jdbc.context.standard][pool-2-thread-1] found source class org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource@41effb31
[08:21:09,622][INFO ][BaseTransportClient ][pool-2-thread-1] creating transport client, java version 1.8.0_45, effective settings {cluster.name=cab074, host.0=abc1234lph4num3r1cstring.us-east-1.aws.found.io, port=9343, sniff=false, autodiscover=false, name=importer, client.transport.ignore_cluster_name=false, client.transport.ping_timeout=5s, client.transport.nodes_sampler_interval=5s, transport.type=org.elasticsearch.transport.netty.FoundNettyTransport, transport.found.api-key=AbcdApiKey1234}
[08:21:09,663][INFO ][org.elasticsearch.plugins][pool-2-thread-1] [importer] loaded [support-1.6.0.0-d7bb0e9], sites []
[08:21:10,172][INFO ][BaseTransportClient ][pool-2-thread-1] trying to connect to [inet[abc1234lph4num3r1cstring.us-east-1.aws.found.io/X.1.X.3:9343]]
[08:21:10,283][INFO ][org.elasticsearch.client.transport][pool-2-thread-1] [importer] failed to get node info for [#transport#-1][mklaber-ee-mbp.local][inet[abc1234lph4num3r1cstring.us-east-1.aws.found.io/X.1.X.3:9343]], disconnecting...
org.elasticsearch.transport.NodeDisconnectedException: [][inet[abc1234lph4num3r1cstring.us-east-1.aws.found.io/X.1.X.3:9343]][cluster:monitor/nodes/info] disconnected
[08:21:10,286][ERROR][importer ][pool-2-thread-1] error while getting next input: no cluster nodes available, check settings {cluster.name=abc1234lph4num3r1cstring, host.0=abc1234lph4num3r1cstring.us-east-1.aws.found.io, port=9343, sniff=false, autodiscover=false, name=importer, client.transport.ignore_cluster_name=false, client.transport.ping_timeout=5s, client.transport.nodes_sampler_interval=5s, transport.type=org.elasticsearch.transport.netty.FoundNettyTransport, transport.found.api-key=AbcdApiKey1234}
org.elasticsearch.client.transport.NoNodeAvailableException: no cluster nodes available, check settings {cluster.name=cab074, host.0=abc1234lph4num3r1cstring.us-east-1.aws.found.io, port=9343, sniff=false, autodiscover=false, name=importer, client.transport.ignore_cluster_name=false, client.transport.ping_timeout=5s, client.transport.nodes_sampler_interval=5s, transport.type=org.elasticsearch.transport.netty.FoundNettyTransport, transport.found.api-key=AbcdApiKey1234}
at org.xbib.elasticsearch.support.client.BaseTransportClient.createClient(BaseTransportClient.java:53) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
at org.xbib.elasticsearch.support.client.BaseIngestTransportClient.newClient(BaseIngestTransportClient.java:22) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
at org.xbib.elasticsearch.support.client.transport.BulkTransportClient.newClient(BulkTransportClient.java:88) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext$1.create(StandardContext.java:440) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSink.beforeFetch(StandardSink.java:94) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext.beforeFetch(StandardContext.java:207) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext.execute(StandardContext.java:188) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
at org.xbib.tools.JDBCImporter.process(JDBCImporter.java:117) ~[elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
at org.xbib.tools.Importer.newRequest(Importer.java:241) [elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
at org.xbib.tools.Importer.newRequest(Importer.java:57) [elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
at org.xbib.pipeline.AbstractPipeline.call(AbstractPipeline.java:86) [elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
at org.xbib.pipeline.AbstractPipeline.call(AbstractPipeline.java:17) [elasticsearch-jdbc-1.6.0.0-uberjar.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]
[08:21:10,290][WARN ][BulkTransportClient ][Thread-1] no client
Help?
The code of FoundNettyTransport
changed to use another domain name found.io
instead of foundcluster.com
. I will update the copy of the code in JDBC importer.
I have released JDBC importer 1.6.0.1 and 1.7.0.1 with an update, in the hope that FoundNettyTransport
will work (can't test it).
@jprante can you push another -dist
package up to http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/1.6.0.1/ ? it's missing the -dist.zip
file...
I used the 1.6.0.0 distribution and replaced the uberjar with the uberjar for 1.6.0.1 to test things out (though it'd be helpful for automation purposes if a zip distribution was available for 1.6.0.1) and it seems to work.
For posterity (and anyone else that finds this thread), you need to use the long abc1234lph4num3r1cstring
value for the cluster
rather than the abc123
that they refer to as cluster ID.
Also, my ACL is messed up. After I loaded found.no's default ACL and added an api key, everything worked.
Thanks for the quick response @jprante
@mklaber Thanks for the testing, I hope the findings are useful for found users. I uploaded the binaries again, now with the dist zips included.
:+1:
Hi mklaber/jprante,
I am new to elasticsearch. Jdbc importer plugin looks awesome.
It worked well on my local windows machine. But when I tried the same on AWS (linux), I am seeing following exception
org.elasticsearch.client.transport.NoNodeAvailableException: no cluster nodes available, check settings
While searching for forums, I saw this thread.
How to configure ACLs or "found.no" that was pasted by mklaber? Do we need to create a file with found.no and place that in
Please help and thanks in advance.
@kishoreactivity I just started with the default ACL they provided and then modified one setting at a time until it was reasonably secure and still working.
@mklaber Thanks for a prompt reply. Can you please point me to default ACL file in elastic search in elastic search installation directory? Under config, I have logging and elasticsearch.yaml files only.
@kishoreactivity when I say default ACL I mean the default provided by found.no: https://www.elastic.co/guide/en/found/current/access-control.html
If something is broken, please let me know. The trick is I copied the settings that are relevant for found.no
to the JDBC importer. So if the settings in Elastic/Found authentication API change, I have to follow these changes. Thanks to open source this is possible.
Hello Jörg,
I try to move my jdbc river on feeder and it's work with elasticsearch on my machin, no problem. I have some problems to communicate with found.no.
Found.no has a specific connector : https://www.found.no/documentation/tutorials/using-java-transport/ Do you know if there is a solution to use feeder with found.no (I can patch you project to modify transport module, it will work isn't it ?)
Thanks for your help.
Antoine