opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.54k stars 1.75k forks source link

[BUG] Server errors propagated with index creation requests #7096

Open kotwanikunal opened 1 year ago

kotwanikunal commented 1 year ago

Describe the bug

Server errors within OpenSearch are exposed to the client when trying to create an index under certain conditions.


Condition 1: When the request body is numeric

PUT /testindex7?pretty HTTP/1.1
Host: ...
User-Agent: curl/7.86.0
Authorization: Basic [...]
Content-Type: application/json
Accept: application/xml
Connection: close
Content-Length: 1

0

Response:

HTTP/1.1 500 Internal Server Error
content-type: application/json; charset=UTF-8
content-length: 390

{
  "error" : {
    "root_cause" : [
      {
        "type" : "not_x_content_exception",
        "reason" : "Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"
      }
    ],
    "type" : "not_x_content_exception",
    "reason" : "Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"
  },
  "status" : 500
}

Condition 2: When cluster manager is unavailable Response:

HTTP/1.1 503 Service Unavailable
content-type: application/json; charset=UTF-8
content-length: 246

{
  "error" : {
    "root_cause" : [
      {
        "type" : "cluster_manager_not_discovered_exception",
        "reason" : null
      }
    ],
    "type" : "cluster_manager_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}

To Reproduce Steps to reproduce the behavior:

  1. Create an index with the above conditions

Expected behavior

Host/Environment (please complete the following information):

Additional context

austintlee commented 1 year ago

For Condition 1, I found this test that verified RestController is returning the error with 400. Where do you think it's converting the status to 500?

https://github.com/opensearch-project/OpenSearch/blob/2f411883677bbffa1bd2e0192ca36f93cddafa66/server/src/test/java/org/opensearch/rest/RestControllerTests.java#L575-L587

dblock commented 1 year ago

Are we saying that forwarding this error is incorrect? When an index cannot be created because the server is broken, I think you want to see a 500 with the server details, don't you?

austintlee commented 1 year ago

I think 400 Bad Request makes sense for the first case since it's an invalid input. The error being returned is exposing the damage the bad input did to the system (what could have been avoided if the input was rejected sooner).

Condition 2 - I think "service unavailable' is the right response in this case.

I think I see two different things being asked here.

The response should abstract out the underlying reason for failure by sending a 400 (Bad Request) response back to the client

This is more about error translation. For every (public) API, we should identify what errors can be returned/thrown and they should be documented and we should have code in place that catches all errors thrown by the service and ensures they are mapped or translated to publicly documented errors and corresponding error messages.

This behavior can be made configurable where setting a property on the server disables or enables the stack trace and detailed response

This feels more like a debugging feature for someone who is writing server side code. In which case, I am not sure if this is needed as you can just go look at the server logs when something failed.

Sudershan29 commented 11 months ago

Hi! I would like to work on this!

I am just making sure I am on the same page. In the case of Condition 1, I need to examine why a 500 is being returned instead of a 400. In the case of Condition 2, I also completely agree that 5XX should be returned when the cluster manager is down.