ruflin / Elastica

Elastica is a PHP client for elasticsearch
http://elastica.io/
MIT License
2.26k stars 736 forks source link

Does RetryOnConflict suppose to work with Bulk updates #2182

Open csabavirag opened 12 months ago

csabavirag commented 12 months ago

I have an application which updates documents in bulk through multiple workers. I'd like to use the retry_on_conflict feature, but even if I have the setting at the ClientConfiguration, Bulk::addDocuments() does not seem to add the necessary piece to the Document metadata in order to get the proper update action as it is described at https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-bulk.html#bulk-update

They say, every update should contain the retry_on_conflict attribute like this:

POST _bulk
{ "update" : {"_id" : "1", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_index" : "index1", "retry_on_conflict" : 3} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_index" : "index1", "_source" : true} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "_source": true}

I only get the _id and _index values added to the document metadata. What do I do wrong?

ruflin commented 11 months ago

Can you share the exact code snippet you are using and the version of Elastica? There was in one version a change from _retry_on_conflict to retry_on_conflict. A quick search, i found there is a for a similar scenario you describe here: https://github.com/ruflin/Elastica/blob/0f6b04b5ebe6df51015fb54b4c455bd7af5b093f/tests/BulkTest.php#L694

csabavirag commented 11 months ago

Actually I'm using FOSElasticaBundle and they have added support for RetryOnConflict parameter per connection (https://github.com/FriendsOfSymfony/FOSElasticaBundle/pull/965) long time ago.

My config looks like:

fos_elastica:
    clients:
        default: { url: '%env(ELASTICSEARCH_URL)%', retryOnConflict: 5 }
    indexes:

which sets

image

However I can't see if the Elastica\Bulk class would take the parameter into consideration at all and would apply the same like in the test

https://github.com/ruflin/Elastica/blob/0f6b04b5ebe6df51015fb54b4c455bd7af5b093f/tests/BulkTest.php#L686C9

ruflin commented 11 months ago

Having a quick look at the code, this might be a bug / missing feature in Elastica. I couldn't find a place where the retry_on_conflict from the agent it taken into account in bulk :-( When adding a document, to bulk, it should be checked if it is an update and if retry on conflict is set on the client side. If yes, it should add it: https://github.com/ruflin/Elastica/tree/8.x/src#L134 Could you by chance make this modification to your version of Elastica and see if it solves the problem?

csabavirag commented 11 months ago

Sure, I will give a try. In the mean time, could you check the link (https://github.com/ruflin/Elastica/tree/8.x/src#L134). You might want to point to a file in the repo, but it got somehow removed.

ruflin commented 11 months ago

This is the file and line I wanted to link to: https://github.com/ruflin/Elastica/blob/8.x/src/Bulk.php#L134 Not sure where the file name was lost :-(

csabavirag commented 10 months ago

Please see my PR. It adds the RetryOnConflict value (if exists) from the Client's connection configuration. With this change, I was able to generate the similar output (with this attribute) which was pasted in the opening message.