Open rtkjbillo opened 10 years ago
Thanks for report, your analysis is correct.
I bumped elasticsearch-support version to 1.2.1.0 which comes with an extra waitForResponses() method to prevent premature closes of BulkProcessor.
Thanks for the quick fix! I see that http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-support/1.2.1.0/elasticsearch-support-1.2.1.0.zip
is now available. Is a compiled 1.2.1 version of elasticsearch-knapsack forthcoming as well so that we can install with the plugin
command?
We ended up patching this further in elasticsearch-support by waiting for all threads to complete for a maximum of 60 seconds, along with some other changes to elasticsearch-knapsack that were useful for our purposes:
Using ElasticSearch 1.1, we are seeing a problem with the import mechanism and this plugin throwing stacktraces and not successfully inserting all data from the output .tar.gz file.
At the end of the bulk import process, ElasticSearch throws NoNodeAvailableException(s) corresponding to the maxBulkConcurrency setting. The number of records missing in the destination ElasticSearch instance appears to fall below (maxBulkConcurrency * maxActionsPerBulkRequest), so on a 4-threaded configuration with 1000 actions, we could see up to 4000 missing entries.
Stacktrace:
Tracking this down through the source for elasticsearch-knapsack and elasticsearch-support, I believe we are running into a "failure to flush" condition when closing out the import, where not all records are successfully written before each thread is terminated.
In
src/main/java/org/xbib/elasticsearch/action/RestImportAction.java
, after thelogger.info("end of import: {}", status);
call, a method call is made tobulkClient.shutdown();
. This in turn callsBulkTransportClient
'ssuper.shutdown();
method, which runs the following code inBaseTransportClient
:At no point does the plugin or support mechanism appear to wait. My suggestion would be to add a line in
RestImportAction.java
to performbulkClient.flush()
prior to the shutdown call, or update the elasticsearch-support plugin to ensure that a shutdown waits for pending actions to complete.