Open acidul opened 11 months ago
Interesting... I suspect that the root cause may be the same as #2607, but this is another good test case.
Hopefully we can address both issues with one fix.
It looks a little bit complicated, update api will transform updateRequest
to a new indexRequest
if the document exists, and the new doc in the indexRequest has been merged with the existing document, see: https://github.com/opensearch-project/OpenSearch/blob/8673fa937db405b8d614f8d4a02c0aa52587c037/server/src/main/java/org/opensearch/action/update/UpdateHelper.java#L106
, after that the new indexRequest will be sent to TransportBulkAction and then execute pipeline.
However, bulk api doesn't do that transformation before executing pipeline, so the behaviors are different. We may not transform updateRequest in TransportBulkAction because the method updateHelper.prepare()
can only be called at shard level.
Another finding is that the behavior of executing pipeline between upsert
AND doc_as_upsert
are also different in bulk api, that's because of this line:
https://github.com/opensearch-project/OpenSearch/blob/8673fa937db405b8d614f8d4a02c0aa52587c037/server/src/main/java/org/opensearch/action/bulk/TransportBulkAction.java#L224.
When the document with ID 1
exists, and doc_as_upsert
is true, the pipeline will be executed on the partial doc {"x":3, "y":5}
:
curl -X POST "localhost:9200/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "update": { "_index": "test1", "_id": "1" } }
{ "doc" : {"x":3, "y":5}, "doc_as_upsert":true}
, but when upsert is set, the pipeline will be executed on the upsert doc {"x":1}
, nothing changed because this doc will not be used anymore:
curl -X POST "localhost:9200/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "update": { "_index": "test1", "_id": "1" } }
{ "doc" : {"x":3, "y":5}, "upsert":{"x":1}}
Describe the bug A single Upsert works as expected with an ingestion pipeline. But the same operation in a Bulk upsert doesn't give the same result.
To Reproduce Steps to reproduce the behavior:
POST index-duration/_update/doc_duration { "doc" : { "event_min": 3, "event_max": 5, "event_name": "occurrence_2" }, "doc_as_upsert": true }
"_source": { "event_end": 5, "old_duration": "Old duration was : 1", "event_duration": 4, "event_min": 1, "event_name": "occurrence_2", "event_max": 5, "event_begin": 1 }
POST _bulk { "update": { "_index": "index-duration", "_id": "doc_duration_issue" } } { "doc" : { "event_min": 1, "event_max": 2, "event_name": "occurrence_1"},"doc_as_upsert": true}
POST _bulk { "update": { "_index": "index-duration", "_id": "doc_duration_issue" } } { "doc" : { "event_min": 3, "event_max": 5, "event_name": "occurrence_2"},"doc_as_upsert": true}
"_source": { "event_end": 5, "old_duration": "Old duration was : null", "event_duration": 2, "event_min": 3, "event_name": "occurrence_2", "event_max": 5, "event_begin": 3 }
"_source": { "event_end": 5, "old_duration": "Old duration was : 1", "event_duration": 4, "event_min": 1, "event_name": "occurrence_2", "event_max": 5, "event_begin": 1 }