is there a way to increase transport timeout longer than 5s?

sjl070707 commented 9 years ago

ES 2.0 ES JDBC 2.0.0.1

first of all thank you for all the hard work. I've been using ES JDBC since 1.7 and it always worked great.

I recently changed my topology along with ES 2.0.

I tried to import the same table which always worked since 1.7

after a while,(very random, upload from 10mb to 1GB) I get time out node not found message. I guess node become unavailable for more than 5 sec and trainsport client times out. [cluster:monitor/nodes/liveness] request_id [631] timed out after [5000ms]

is there a way to configure the ESJDBC so that it waits more than 5s for pinging the cluster?

Thanks

jprante commented 9 years ago

Is there something in the ES server logs?

Increasing liveness monitor timeout is only a last resort and will not solve the problem at its cause.

sjl070707 commented 9 years ago

I would get this msg every 40minutes or so. But it seems like indexing and uploading continues dispite this message. Im guess its because i have 2 data nodes. (The disconnect msg was for one of the data nodes ip and they alternate between two data nodes) I will leave it until index doesnt increase anymore. As see if it ends up loading the entire data table. Meanwhile i will check out the log and see if theres anything unisal

On Monday, 9 November 2015, Jörg Prante notifications@github.com wrote:

Is there something in the ES server logs?

Increasing liveness monitor timeout is only a last resort and will not solve the problem at its cause.

— Reply to this email directly or view it on GitHub https://github.com/jprante/elasticsearch-jdbc/issues/693#issuecomment-155234855 .

sjl070707 commented 9 years ago

This is strange, I had several 10's of these messages while uploading 31 million documents from SQL server. It just kept on going and eventually uploaded all the documents I needed.

I masked my real IP.

[21:47:16,057][INFO ][org.elasticsearch.client.transport][elasticsearch[importer][generic][T#114]] [importer] failed to get node info for {#transport#-1}{data1 node ip}{data1 node ip:9300}, disconnecting... org.elasticsearch.transport.ReceiveTimeoutTransportException: [][data1 node ip:9300][cluster:monitor/nodes/liveness] request_id [42798] timed out after [5000ms] at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:645) ~[elasticsearch-jdbc-2.0.0.1-uberjar.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_51] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_51] [22:03:19,587][INFO ][org.elasticsearch.client.transport][elasticsearch[importer][generic][T#114]] [importer] failed to get node info for {#transport#-4}{client1 node ip142}{client1 node ip:9300}, disconnecting... org.elasticsearch.transport.ReceiveTimeoutTransportException: [][client1 node ip:9300][cluster:monitor/nodes/liveness] request_id [45303] timed out after [5001ms] at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:645) ~[elasticsearch-jdbc-2.0.0.1-uberjar.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_51] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_51] [22:37:02,765][WARN ][org.xbib.elasticsearch.support.client.transport.BulkTransportClient][Thread-1] no client

jprante commented 9 years ago

The message means that ES cluster can not respond within 5 seconds, you should see some GC monitoring messages at server side.

I doubt you could transfer all documents. The message BulkTransportClient "no client" is an emergency message meaning the transfer has been aborted.

sjl070707 commented 9 years ago

Thank you Jorg,

60,343,976 1 time replica, so

30,171,988 docs created from, 30,168,988 rows from SQL table

I ended up with 3000 more documents.????, it's something I need to look into

I was very careful with the settings so I would not go over my server's capacity. from Marvel, i never went over 70% of JVM

   "fetchsize" : 1000,
    "flush_interval" : "1000s",

"max_retries" : 3, "max_retries_wait" : "30s", "max_bulk_actions" : 500, "max_bulk_volume" : "1m", "ignore_null_values" : true, "index" : "userdata", "type" : "userdata", "threadpoolsize" : 1

On Tue, Nov 10, 2015 at 4:50 AM, Jörg Prante notifications@github.com wrote:

The message means that ES cluster can not respond within 5 seconds, you should see some GC monitoring messages at server side.

I doubt you could transfer all documents. The message BulkTransportClient "no client" is an emergency message meaning the transfer has been aborted.

— Reply to this email directly or view it on GitHub https://github.com/jprante/elasticsearch-jdbc/issues/693#issuecomment-155372928 .

sjl070707 commented 9 years ago

I further investigated issue,

so we are staring from 30,171,988 docs created from, 30,168,988 rows from SQL table

3000 more records in ES

I used python to verify both SQL server and ES sources and find out:

difference in the data

All data has been transferred over to ES (great!) 3000 uid were duplicated twice.

I am guessing these duplication occurd while one of the data node was temporary down for longer than 5 sec.

jprante / elasticsearch-jdbc

is there a way to increase transport timeout longer than 5s? #693