strapdata / elassandra

Elassandra = Elasticsearch + Apache Cassandra
http://www.elassandra.io
Apache License 2.0
1.72k stars 200 forks source link

Nested objects with inner hits query returns "extracted source isn't an object or an array" #346

Closed convell closed 4 years ago

convell commented 4 years ago

Elassandra version: Behavior is seen on official docker images 6.8.4.0 -> 6.8.4.5

Plugins installed: None

JVM version (java -version): Docker image of 6.8.4.5 is running openjdk version "1.8.0_252"

OS version (uname -a if on a Unix-like system): Running on a container on a host that is running 18.04.1-Ubuntu

Description of the problem including expected versus actual behavior: I am moving from 6.2.3.28 to 6.8.4.5, and have noticed queries with inner hits on nested objects (as outlined in the steps below) are failing with the following error coming back:

{
    "error": {
        "root_cause": [
            {
                "type": "illegal_state_exception",
                "reason": "extracted source isn't an object or an array"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "nested_data",
                "node": "5c622bfa-fef0-4dd0-b040-5a68a767beb5",
                "reason": {
                    "type": "illegal_state_exception",
                    "reason": "extracted source isn't an object or an array"
                }
            }
        ],
        "caused_by": {
            "type": "illegal_state_exception",
            "reason": "extracted source isn't an object or an array",
            "caused_by": {
                "type": "illegal_state_exception",
                "reason": "extracted source isn't an object or an array"
            }
        }
    },
    "status": 500
}

Reasons why I think this is a bug:

  1. When tested against ES 6.8.4 docker image I do not see the same issue.
  2. I do not see this issue on Elassandra 6.2.3.28, and assume there is backwards compatibility of query/mapping/docs on non major version releases

Other behaviors I noted: Removing the inner_hits block of the query stops the issue, and so does turning off the source in the inner_hits block. However neither of these are desirable solutions

Something else that seems to resolve the issue while I was triaging is removing nested-object2 when inserting the data in step 2 below

Steps to reproduce:

  1. Create index at nested_data with the following mapping:
    {
    "mappings": {
    "nested_data": {
      "properties": {
        "parent-nested": {
          "type": "nested",
          "properties": {
            "nested-object1": {
              "type": "nested",
              "properties": {
                "field1": {
                  "type": "text"
                },
                "field2": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            },
            "nested-object2": {
              "type": "nested",
              "properties": {
                "field2": {
                  "type": "text"
                }
              }
            }
          }
        }
      }
    }
    }
    }
  2. Insert the following document in the above created index:
    {
    "parent-nested": [
    {
      "nested-object1": [
        {
          "field1": "hello"
        }
      ]
    },
    {
      "nested-object2": {
      }
    }
    ]
    }
  3. Run the following query against the index:
    {
    "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "parent-nested.nested-object1",
            "inner_hits": {
            },
            "query": {
              "bool": {
                "should": {
                  "bool": {
                    "must": {
                      "match": {
                        "parent-nested.nested-object1.field1": "hello"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      ]
    }
    }
    }

Please provide the following information:

CREATE TYPE nested_data.nested_data_parent_nested_nested_object1 ( field1 list, field2 list );

CREATE TYPE nested_data.nested_data_parent_nested_nested_object2 ( field2 list );

CREATE TYPE nested_data.nested_data_parent_nested ( "nested-object1" list<frozen>, "nested-object2" list<frozen> );

CREATE TABLE nested_data.nested_data ( "_id" text PRIMARY KEY, "parent-nested" list<frozen> ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE'; CREATE CUSTOM INDEX elastic_nested_data_idx ON nested_data.nested_data () USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';

vroyer commented 4 years ago

Yes, this is a bug in elassandra version 6.8.4.x. A change in the elasticseach 6.5 internal storage of nested documents was not properly implemented in elassandra 6.8.4.x. This is fixed for the next release. Thank's for reporting a complete reproductible test case .

convell commented 4 years ago

Thank you for the quick response, I look forward to the next release!

vroyer commented 4 years ago

Fixed in release 6.8.4.6