yakaz / elasticsearch-action-updatebyquery

ElasticSearch Update By Query action plugin
113 stars 24 forks source link

UpdateByQueryResponse throwing timeout #47

Open Praveen82 opened 9 years ago

Praveen82 commented 9 years ago

Hi All,

I am Using "elasticsearch-action-updatebyquery"

Reference : https://github.com/yakaz/elasticsearch-action-updatebyquery4

API : The following api will do update bulk "Segment ids" to mached documents.

Example : segmentId= 50 needs to update on more then 20+ million documents.

Map scriptParams = new HashMap(); scriptParams.put("segmentexist", segId); scriptParams.put("pgsegmentobject", pgSegmentIds);

UpdateByQueryClient updateByQueryClient = new UpdateByQueryClientWrapper(client);

UpdateByQueryResponse response = updateByQueryClient.prepareUpdateByQuery().setIndices(props.getProperty("index")).setTypes(props.getProperty("type"))
        .setTimeout(TimeValue.timeValueHours(24))
        .setIncludeBulkResponses(BulkResponseOption.ALL)

        .setScript("if (ctx._source.containsKey(\"pgSegmentIds\") ) { if (ctx._source.pgSegmentIds.contains(segmentexist) ) { ctx.op = \"none\" } else { ctx._source.pgSegmentIds += pgsegmentobject} } else { ctx._source.pgSegmentIds = pgsegmentobject }")
        .setScriptParams(scriptParams)

        .setQuery(query)

        .execute()
        .actionGet();

Its failing while update. I see the following exception.

2015-09-12 05:58:10 INFO transport:123 - [Moon Knight] failed to get local cluster state for [#transport#-1][ip-10-186-199-195][inet[localhost/10.31.48.47:9300]], disconnecting... org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/10.31.48.47:9300]][cluster/state] request_id [416] timed out after [5000ms] at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-09-12 05:58:10 INFO transport:123 - [Moon Knight] failed to get local cluster state for [PGMonetize-ES04][bqljhciDQ4-Tr2dRAcbWtw][ip-10-31-48-47][inet[/10.31.48.47:9300]]{master=true}, disconnecting... org.elasticsearch.transport.ReceiveTimeoutTransportException: [PGMonetize-ES04][inet[/10.31.48.47:9300]][cluster/state] request_id [423] timed out after [5001ms] at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

I have done following setup

1) We are having 5 nodes, with 5 shards. 2) script.disable_dynamic: false action.updatebyquery.bulk_size: 2500

still I get the above exception. Please help.

How to solve this issue and how to improve performance like ( Updating 20+ million record in <10 mins)

Praveen82 commented 8 years ago

any update on this?

Praveen82 commented 8 years ago

is there any way to update 20+ million documents < 10 minutes?