Open NehaV0307 opened 5 days ago
Similar issue: https://github.com/opensearch-project/OpenSearch/issues/10864, the root cause is that Update API converts the updateRequest to an indexRequest if the document exists, so the default ingest pipeline is executed, but Bulk API keep the updateRequest as the origin.
By checking the code, I think ingest pipeline was designed only for index operation, not for update operation, we can also see that the Index API supports pipeline
parameter but Update API doesn't, so maybe we should prevent the default ingest pipeline from being executed in Update API.
For this use case, I've tried to find some workaround, one option is that use painless script to update the updated
field, like this:
POST /on_boarding_employees-1/_update/1
{
"script": {
"source": "ctx._source.updated =ctx._now;ctx._source.type=params.type",
"params": {
"type": "ONBOARDING_EMPLOYEE_UPDATED"
}
}
}
or
POST /on_boarding_employees-1/_bulk
{"update":{"_id":"1"}}
{"script":{"source":"ctx._source.updated =ctx._now;ctx._source.type=params.type","params":{"type":"ONBOARDING_EMPLOYEE_UPDATED"}}}
@andrross @macohen @reta what do you think about this?
Thanks @gaobinlong for looking into it
By checking the code, I think ingest pipeline was designed only for index operation, not for update operation, we can also see that the Index API supports pipeline parameter but Update API doesn't, so maybe we should prevent the default ingest pipeline from being executed in Update API.
Found this long thread on the matter [1], TLDR; is that Update API does not support ingest pipelines, we should probably document that (and prevent if possible).
Thanks @reta, I've created an document issue for this and will open a PR later.
For the code, does it make sense that we return an deprecation warning in 2.x version for the update API and then remove the support in 3.0.0? It maybe a breaking change for some users.
Thanks @gaobinlong
Thanks @reta, I've created an document issue for this and will open a PR later.
:+1:
For the code, does it make sense that we return an deprecation warning in 2.x version for the update API and then remove the support in 3.0.0?
But this functionality does not work, does it?
Describe the bug
Ingest Pipeline works fine for single call of create, index and Update for pipeline. Bulk create, bulk index works fine for pipeline only when we are performing bulk update it doesn't work.
Related component
Other
To Reproduce
PUT _ingest/pipeline/update_timestamp { "description": "Automatically updates the 'updated' field on insert or update", "processors": [ { "set": { "field": "updated", "value": "{{_ingest.timestamp}}" } } ] }
Output
{ "acknowledged": true }
2.Create index
PUT /on_boarding_employees-1 { "settings": { "index": { "default_pipeline": "update_timestamp" } } }
Output
{ "acknowledged": true, "shards_acknowledged": true, "index": "on_boarding_employees-1" }
Adding Doc:
POST /on_boarding_employees-1/_doc { "type": "ONBOARDING_EMPLOYEE", "name": “Rahul” }
Output
{ "_index": "on_boarding_employees-1", "_id": "9f2pM5MB70XT8uT4kP1K", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 }
Match query Output:
{ "took": 620, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 1, "hits": [ { "_index": "on_boarding_employees-1", "_id": "9f2pM5MB70XT8uT4kP1K", "_score": 1, "_source": { "name": “Rahul”, "type": "ONBOARDING_EMPLOYEE", "updated": "2024-11-16T06:29:30.826236733Z" } } ] } }
Normal Update:
POST /on_boarding_employees-1/_update/9f2pM5MB70XT8uT4kP1K { "doc": { "type": "ONBOARDING_EMPLOYEE_UPDATED" } }
Output
{ "_index": "on_boarding_employees-1", "_id": "9f2pM5MB70XT8uT4kP1K", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 1, "_primary_term": 1 }
Match query Output:
"took": 268, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 1, "hits": [ { "_index": "on_boarding_employees-1", "_id": "9f2pM5MB70XT8uT4kP1K", "_score": 1, "_source": { "name": “Rahul”, "type": "ONBOARDING_EMPLOYEE_UPDATED", "updated": "2024-11-16T06:33:05.478645288Z" } } ] } }
Bulk Update:
POST /on_boarding_employees-1/_bulk?pipeline=update_timestamp {"update":{"_id":"9f2pM5MB70XT8uT4kP1K"}} {"doc":{"type":"ONBOARDING_EMPLOYEE14","name":"Aman2"}} {"update":{"_id":"9v2xM5MB70XT8uT4uv0x"}} {"doc":{"type":"ONBOARDING_EMPLOYEE13","name":"Neha"}}
Match query Output:
{ "took": 777, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": 1, "hits": [ { "_index": "on_boarding_employees-1", "_id": "9v2xM5MB70XT8uT4uv0x", "_score": 1, "_source": { "name": "Neha", "type": "ONBOARDING_EMPLOYEE13", "updated": "2024-11-16T06:38:25.841280080Z" } }, { "_index": "on_boarding_employees-1", "_id": "9f2pM5MB70XT8uT4kP1K", "_score": 1, "_source": { "name": "Aman2", "type": "ONBOARDING_EMPLOYEE14", "updated": "2024-11-16T06:33:05.478645288Z" } } ] } }
Expected behavior
Expected behaviour would be updating the timefield, but it remains same for bulk operation "updated": "2024-11-16T06:33:05.478645288Z"
Additional Details
No response