ruflin / Elastica

Elastica is a PHP client for elasticsearch
http://elastica.io/
MIT License
2.26k stars 736 forks source link

UpdateByQuery issue #2226

Open rtwent opened 2 months ago

rtwent commented 2 months ago

Given: ES: 7.17 ruflin/elastica: 7.2.0 (but the issue is present on the dev version too)

Background: Due to architectural issues - we run updateByQuery in loop. Troubles in debugging had appeared.

By the ES documentation the response of UpdateByQuery request is documented. It seems, for now, is impossible to retrieve the message, why request was failed.

Response::HasError() will always return false, because of error property is not documented for _update_by_query. As a result error message will be empty: https://github.com/ruflin/Elastica/blob/e2cb89b449ef16e770208618d9a81697946806fd/src/Response.php#L134-L139

isOk is a way to identify the response status. But still, no error message will be provided.

Example of failed response from elastic:

{
  "took":2,
  "timed_out":false,
  "total":1,
  "updated":0,
  "deleted":0,
  "batches":1,
  "version_conflicts":1,
  "noops":0,
  "retries":{
    "bulk":0,
    "search":0
  },
  "throttled_millis":0,
  "requests_per_second":-1,
  "throttled_until_millis":0,
  "failures":[
    {
      "index":"xxxxx",
      "type":"_doc",
      "id":"xxxxxx",
      "cause":{
        "type":"version_conflict_engine_exception",
        "reason":"[xxxxxx]: version conflict, required seqNo [15668298], primary term [2]. current document has seqNo [15668306] and primary term [2]",
        "index_uuid":"wUFh0yvBRkavh7cCHAxWcg",
        "shard":"1",
        "index":"xxxxx"
      },
      "status":409
    }
  ]
}
ruflin commented 1 month ago

Is my understanding correct that the info you are looking for is something like getFailures()? There is hasFailedShards() but it only returns a bool: https://github.com/ruflin/Elastica/blob/e2cb89b449ef16e770208618d9a81697946806fd/src/Response.php#L146

rtwent commented 1 month ago

Is my understanding correct that the info you are looking for is something like getFailures()?

Yep, something like that.

But to be honest for clients of the library is more convenient way to use generic ::getErrorMessage() than to keep in mind what kind of update is processed:

$response = $this->index->updateByQuery(...);
if ($response->isOk()) {
  $response->getData();
} else {
  $response->getErrorMessage();
}

(This is just my IMO that is not consistent with ES API)

ruflin commented 4 weeks ago

@rtwent Would you be interested to take a stab at this and open a PR?

rtwent commented 3 weeks ago

It seems that my problem is in using the obsolete version of Your library. I investigated a bit. We use 7.2.0 and the code below gave us unexpected behavior: https://github.com/ruflin/Elastica/blob/8fe61767b604a10e40b0b59c9511c4317cae3cb8/src/Transport/Http.php#L177-L179 since there is no error in the reponse body (only failures) - the response is successfully populated. No exception is thrown.

I forked the project and tried to reproduce such behaviour on 8.* branch. I was not successful. Http transport now is absent and guzzle throws an exception if something is wrong with the status code:

/**
* @group functional
*/
public function testUpdateByQueryFailure(): void
{
    $index = $this->_createIndex();
    $index->addDocument(new Document('1', ['name' => 'ruflin nicolas']),);
    $index->refresh();

    $this->assertEquals(1, $index->search('nicolas')->count());

    $this->expectException(ClientResponseException::class);
    // force updateByQuery failure by changing immutable field
    $index->updateByQuery('*', new Script('ctx._source._id = "marc"'));
}  

I hope I understood everything correctly. Is there a sense to fix it for ^7.2?