opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.43k stars 1.72k forks source link

[BUG] Different error response in delete API #14805

Open imvtsl opened 1 month ago

imvtsl commented 1 month ago

Describe the bug

Response body of failures on create index API, delete index API, create document API, etc follow a consistent format. However, response body of failure in delete document API is different. It is similar to the response body of Success scenario. See all responses below.

I believe the best practice is for error responses to be standardized across all APIs. It makes error handling for the client easier. See this issue.

Sample Responses

Index create failed

{
    "error": {
        "root_cause": [
            {
                "type": "resource_already_exists_exception",
                "reason": "index [movies/Um9KkSRHQlyaVClh-O6GRw] already exists",
                "index": "movies",
                "index_uuid": "Um9KkSRHQlyaVClh-O6GRw"
            }
        ],
        "type": "resource_already_exists_exception",
        "reason": "index [movies/Um9KkSRHQlyaVClh-O6GRw] already exists",
        "index": "movies",
        "index_uuid": "Um9KkSRHQlyaVClh-O6GRw"
    },
    "status": 400
}

Index delete failed

{
    "error": {
        "root_cause": [
            {
                "type": "index_not_found_exception",
                "reason": "no such index [games]",
                "index": "games",
                "resource.id": "games",
                "resource.type": "index_or_alias",
                "index_uuid": "na"
            }
        ],
        "type": "index_not_found_exception",
        "reason": "no such index [games]",
        "index": "games",
        "resource.id": "games",
        "resource.type": "index_or_alias",
        "index_uuid": "na"
    },
    "status": 404
}

Document create failed

{
    "error": {
        "root_cause": [
            {
                "type": "version_conflict_engine_exception",
                "reason": "[2]: version conflict, document already exists (current version [1])",
                "index": "movies",
                "shard": "0",
                "index_uuid": "Um9KkSRHQlyaVClh-O6GRw"
            }
        ],
        "type": "version_conflict_engine_exception",
        "reason": "[2]: version conflict, document already exists (current version [1])",
        "index": "movies",
        "shard": "0",
        "index_uuid": "Um9KkSRHQlyaVClh-O6GRw"
    },
    "status": 409
}

Document delete failed

{
    "_index": "movies",
    "_id": "3",
    "_version": 1,
    "result": "not_found",
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "_seq_no": 36,
    "_primary_term": 2
}

The above follows the same format as Success scenario:

Document delete success

{
    "_index": "sample-index",
    "_id": "1",
    "_version": 2,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 2,
    "_primary_term": 1
}

Related component

Indexing

To Reproduce

  1. Create index, say 'sample-index'. curl -X PUT localhost:9200/sample-index
  2. Delete non existing document in this index.
    curl -X DELETE localhost:9200/sample-index/_doc/1

Expected behavior

Error response format in case of delete document failure should be same as failure in other APIs.

Additional Details

No response

imvtsl commented 1 month ago

I can't find the info on error responses for the APIs in the API spec or the API documentation.

Or am i missing something?

mgodwan commented 1 month ago

[Indexing Triage Meeting 07/22]

This seems a fairly reasonable ask to align the error response across APIs.

  1. If we change the response completely, its definitely a breaking change for existing integrations. We need to be careful in how to make this change
  2. In case of adding new fields as well, can we verify if existing clients continue to work with the new fields in the response or if they break
  3. There are few good fields the API output in error cases (i.e. seq no, primary term, _id) which we should see how to retain.

@imvtsl Could you help checking on these aspects?

imvtsl commented 1 month ago

As discussed in the meeting today, I have created issues in Documentation and API specification repositories for missing error responses.

imvtsl commented 1 month ago
  1. If we change the response completely, its definitely a breaking change for existing integrations. We need to be careful in how to make this change

Is a breaking change acceptable if it's part of a major release (e.g., 3.0)? If not, we could implement API versioning (e.g., /v2/sample-index/_doc/1) to maintain the current API version's behavior.

  1. In case of adding new fields as well, can we verify if existing clients continue to work with the new fields in the response or if they break

@peternied did suggest something similar to this approach for backwards compatibility. New fields in delete API error response would be error block, something like below:

{
    "_index": "sample-index",
    "_id": "1",
    "_version": 2,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 2,
    "_primary_term": 1,
    "error": {
        "root_cause": [
            {
                "type": "version_not_found_exception",
                "reason": "[2]: version not found, document does not exists (current version [1])",
                "index": "sample-index",
                "shard": "0",
                "index_uuid": "Um9KkSRHQlyaVClh-O6GRw"
            }
        ],
        "type": "version_not_found_exception",
        "reason": "[2]: version not found, document does not exists (current version [1])",
        "index": "sample-index",
        "shard": "0",
        "index_uuid": "Um9KkSRHQlyaVClh-O6GRw"
    },
    "status": 404
}

Thanks @peternied for the suggestion.

This approach maintains backward compatibility but results in non-uniform JSON across APIs. Besides the "error" key, all other keys are specific to individual APIs, which could undermine the goal of having a standardized error response format. Additionally, existing clients might encounter issues if they are configured to fail on unknown properties during deserialization (see Jackson's FAIL_ON_UNKNOWN_PROPERTIES setting). Configuration like this could inadvertently break backward compatibility.

Alternatively, can we nest those into a free form object or a string? See below.

  1. There are few good fields the API output in error cases (i.e. seq no, primary term, _id) which we should see how to retain.

We could introduce a "message" key to handle additional fields in a free-form object or string. When required, API specific clients would parse this to get client specific info. So, the json response would look something like below.

{
    "message": {
        "_index": "sample-index",
        "_id": "1",
        "_version": 2,
        "result": "deleted",
        "_shards": {
            "total": 2,
            "successful": 1,
            "failed": 0
        },
        "_seq_no": 2,
        "_primary_term": 1
    },
    "error": {
        "root_cause": [
            {
                "type": "version_not_found_exception",
                "reason": "[2]: version not found, document does not exists (current version [1])",
                "index": "sample-index",
                "shard": "0",
                "index_uuid": "Um9KkSRHQlyaVClh-O6GRw"
            }
        ],
        "type": "version_not_found_exception",
        "reason": "[2]: version not found, document does not exists (current version [1])",
        "index": "sample-index",
        "shard": "0",
        "index_uuid": "Um9KkSRHQlyaVClh-O6GRw"
    },
    "status": 404
}

Because "message" is a free form object, clients on a high level have to define one generic error response object for all APIs (error, message and status keys). Specific API clients would override this message object and parse it in the API specific way. This approach does break backwards compatibility, but leads to generalization.

Thoughts, @peternied , @mgodwan , @dblock ?

imvtsl commented 1 month ago

I debugged the code to understand the flow of the delete API call. I have summarized it briefly:

InternalEngine returns DeleteResult with resultType set to "SUCCESS" and failure field set to null, if the document to be deleted is not found. I am not sure why it was designed in this way to return "SUCCESS". This results in the BulkPrimaryExecutionContext to treat the response as a success. That's why the response body of delete success and delete 404 are the same.

I compared it with the code flow in case create document request fails. When a create document request fails, resultType is set to "FAILURE" and failure is set to VersionConflictEngineException. This leads to the response to be treated as failure.

To address this issue, we could modify the delete handling to return resultType set to FAILED and failure set to DocumentMissingException. Then, the response from the transport layer would be treated as failure by the REST layer. And then we can return the response in the format decided.

imvtsl commented 1 month ago

On a different note, do we have any strict timelines to work on the issues. We triaged it about two weeks ago, and I was away for some time, so I could only recently begin working on it. I wanted to check if we are okay?

dblock commented 1 month ago

On a different note, do we have any strict timelines to work on the issues. We triaged it about two weeks ago, and I was away for some time, so I could only recently begin working on it. I wanted to check if we are okay?

There are no expectations of anyone doing any work or timelines as with most open source.