richardwilly98 / elasticsearch-river-mongodb

MongoDB River Plugin for ElasticSearch
1.12k stars 215 forks source link

Make the river more resistant to bulk import failures #570

Open PeterBackman opened 8 years ago

PeterBackman commented 8 years ago

Hi, in our system we can occasionally get documents that can not be imported into ES. We do not have full control of the input so sometimes we hit the 32k limit in Lucene which will prevent the document to be inserted into ES. The bulk import will fail and the river is stopped.

Locally I made a patch to make afterBulk(long executionId, BulkRequest request, BulkResponse response) öog an error and continue without stopping the river. Seems to work fine.

Is there a reason the river must be stopped or would the above change be interesting?

ankon commented 8 years ago

I guess the main question here is whether someone actually reads the logs :)

In our case stopping the river is preferable, because we can monitor that easily, and then call a human to investigate the failure. There should never be ignorable import failures for us. Other situations might differ, so even if the change might not ever be applied to avoid accidental foot-shooting, it could still be useful for other people that can tolerate some data loss.