opendistro-for-elasticsearch / sql

🔍 Open Distro SQL Plugin
https://opendistro.github.io/for-elasticsearch/features/SQL%20Support.html
Apache License 2.0
621 stars 186 forks source link

SELECT nested() cannot get sub properties #963

Open FreCap opened 3 years ago

FreCap commented 3 years ago

How to reproduce:


DELETE /my_index_nested_sub_field

PUT /my_index_nested_sub_field
{
  "mappings": {
    "properties": {
      "myNestedField": {
        "type": "nested"
      }
    }
  }
}

POST my_index_nested_sub_field/_doc/
{
  "id": 2,
  "myNestedField": [
    {
      "myNestedId": 7,
      "myNestedFieldString": "5",
      "myNestedFieldObject": {
        "val1":1,
        "val2":2,
        "val3":3
      }
    },
    {
      "myNestedId": 8,
      "myNestedFieldString": "3",
      "myNestedFieldObject": {
        "val1":4,
        "val2":5,
        "val3":6
      }
    }
  ]
}

1) Works as expected, contains both myNestedFieldObject and myNestedId


POST _opendistro/_sql
{
  "query": """
SELECT  nested(myNestedField.myNestedFieldObject), nested(myNestedField.myNestedId) FROM my_index_nested_sub_field
  """
}

2) Doesn't work as expected: partial response, doesn't contain myNestedFieldObject


POST _opendistro/_sql
{
  "query": """
SELECT  nested(myNestedField.*) FROM my_index_nested_sub_field
  """
}

Response doesn't contain myNestedFieldObject

{
  "schema": [
    {
      "name": "myNestedField.myNestedId",
      "type": "long"
    },
    {
      "name": "myNestedField.myNestedFieldString",
      "type": "text"
    }
  ],
  "total": 2,
  "datarows": [
    [
      7,
      "5"
    ],
    [
      8,
      "3"
    ]
  ],
  "size": 2,
  "status": 200
}

3) Doesn't work as expected: throws exception

POST _opendistro/_sql
{
  "query": """
SELECT  nested(myNestedField.myNestedFieldObject.val1)  FROM my_index_nested_sub_field
  """
}

Response:
{
  "error": {
    "reason": "Error occurred in Elasticsearch engine: all shards failed",
    "details": """Shard[0]: [my_index_nested_sub_field/WWDKmuGGSy6sdOcG8JkInA] QueryShardException[failed to create query: [nested] nested object under path [myNestedField.myNestedFieldObject] is not of nested type]; nested: IllegalStateException[[nested] nested object under path [myNestedField.myNestedFieldObject] is not of nested type];

For more details, please send request for Json format to see the raw response from elasticsearch engine.""",
    "type": "SearchPhaseExecutionException"
  },
  "status": 400
}

Same for adding myNestedField,

POST _opendistro/_sql
{
  "query": """
SELECT  nested(***myNestedField***,myNestedField.myNestedFieldObject.val1)  FROM my_index_nested_sub_field
  """
}

Response:
{
  "error": {
    "reason": "Error occurred in Elasticsearch engine: all shards failed",
    "details": """Shard[0]: [my_index_nested_sub_field/WWDKmuGGSy6sdOcG8JkInA] QueryShardException[failed to create query: [nested] failed to find nested object under path [myNestedField.myNestedFieldObject.val1]]; nested: IllegalStateException[[nested] failed to find nested object under path [myNestedField.myNestedFieldObject.val1]];

For more details, please send request for Json format to see the raw response from elasticsearch engine.""",
    "type": "SearchPhaseExecutionException"
  },
  "status": 400
}
chloe-zh commented 3 years ago

Hi @FreCap , thanks for reporting this issue! I was able to reproduce these cases.

1 For the second case SELECT nested(myNestedField.*) FROM my_index_nested_sub_field, since the old engine was once not able to convert nested type to a standard type which our JDBC driver supported, so we disabled the nested field to be selected when the query is to select all (select *) or select wildcard (e.g. select myNestedField.*) in the legacy codebase.

Therefore the myNestedField.myNestedFieldObject is left out in the response. The workaround would be enable the nested type back since the type conversion is supported in the latest code version now. We will have it fixed once we have resources.

One option before the fix is to set the format to JSON (which are the search hits as response caught from the ES engine) rather than the default JDBC format by adding the param ?format=json at the end of the request, and it works well as I test out. For example:

POST _opendistro/_sql?format=json
{
  "query": "select nested(myNestedField.*) from nested_data"
}

response:
{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "nested_data",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "myNestedField" : [
            {
              "myNestedId" : 7,
              "myNestedFieldString" : "5",
              "myNestedFieldObject" : {
                "val1" : 1,
                "val2" : 2,
                "val3" : 3
              }
            },
            {
              "myNestedId" : 8,
              "myNestedFieldString" : "3",
              "myNestedFieldObject" : {
                "val1" : 4,
                "val2" : 5,
                "val3" : 6
              }
            }
          ]
        },
        "inner_hits" : {
          "myNestedField" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : 1.0,
              "hits" : [
                {
                  "_index" : "nested_data",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_nested" : {
                    "field" : "myNestedField",
                    "offset" : 0
                  },
                  "_score" : 1.0,
                  "_source" : {
                    "myNestedId" : 7,
                    "myNestedFieldObject" : {
                      "val3" : 3,
                      "val2" : 2,
                      "val1" : 1
                    },
                    "myNestedFieldString" : "5"
                  }
                },
                {
                  "_index" : "nested_data",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_nested" : {
                    "field" : "myNestedField",
                    "offset" : 1
                  },
                  "_score" : 1.0,
                  "_source" : {
                    "myNestedId" : 8,
                    "myNestedFieldObject" : {
                      "val3" : 6,
                      "val2" : 5,
                      "val1" : 4
                    },
                    "myNestedFieldString" : "3"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

2 For case SELECT nested(myNestedField.myNestedFieldObject.val1) FROM my_index_nested_sub_field, the stack trace indicates that myNestedField.myNestedFieldObject is not of nested type, so the nested operation is not eligible here. Currently we do not support either object type in nested fields or deep nested fields yet, but we will have workaround to support nested fields in our new engine very soon. We will keep you updated once such features are good to release. You can also checkout our release notes in the repository or the announcements in the ODFE website to see the latest features. Thanks!

FreCap commented 3 years ago

Thank you @chloe-zh!

I saw I could use format=json, but in this phase of dev I was trying to switch to full jdbc given how hard to read the json format for complex aggregates can be.

, the stack trace indicates that myNestedField.myNestedFieldObject is not of nested type, In which file do you find the stacktrace? I've been trying to look for it but I couldn't find it in my cluster

Thank you for keeping me up to date. Francesco

chloe-zh commented 3 years ago

Hi @FreCap , you can find the stack trace from es log in the node.

FreCap commented 3 years ago

Thanks @chloe-zh!

As a suggestion, it'd be great having the log in the response without having to go to the machine. Having tens of machines (or not having access to them for security reasons), it becomes an extremely challenging task to complete

FreCap commented 3 years ago

Do you have any suggestion on what I could do to contribute a fix? Any, even small, guidance would be greatly appreciated

chloe-zh commented 3 years ago

Could you checkout the method populateColumns in legacy/src/main/java/com/amazon/opendistroforelasticsearch/sql/legacy/executor/format/SelectResultSet.java to try for a workaround if you'd like to. The issue should be rooted around there.