strapdata / elassandra

Elassandra = Elasticsearch + Apache Cassandra
http://www.elassandra.io
Apache License 2.0
1.72k stars 200 forks source link

Indexing fails when request contains extra fields and dynamic mapping is set to false #193

Open krishkoneru opened 6 years ago

krishkoneru commented 6 years ago

Elassandra version: Elassandra 5.5.0.13 Plugins installed: [None]

JVM version: java version "1.8.0_151" Java(TM) SE Runtime Environment (build 1.8.0_151-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

OS version: Debian Jessie Docker Container running in CoreOs

Description of the problem including expected versus actual behavior:

Elassandra rejects all writes that contain any extra fields when dynamic mapping is disabled.

{"error":{"root_cause":[{"type":"mapper_exception","reason":"Unmapped field [somethingelse]"}],"type":"mapper_exception","reason":"Unmapped field [somethingelse]"},"status":500}

Expected Elassandra should accept writes and only index the fields specified in the mappings. https://www.elastic.co/guide/en/elasticsearch/reference/5.5/dynamic.html

Steps to reproduce:

  1. Create index with mappings and set dynamic to flase
    
    ~ $ curl -XPUT localhost:9200/dynamicmappingtest -d ' {
    >
    >  "mappings": {
    >       "_default_": {
    >         "_all" : { "enabled": false }
    >       },
    >     "friends": {
    >       "dynamic": false,
    >       "properties": {
    >         "name": {"type" : "keyword" },
    >         "id": {"type" : "keyword" },
    >         "name": {"type" : "keyword" },
    >         "tags": {
    >             "properties": {
    >             "id": {"type" : "integer" },
    >             "name": {"type" : "keyword" }
    >             }
    >         }
    >       }
    >     }
    >
    >  }
    > }'

{"acknowledged":true,"shards_acknowledged":true}

 2. Try to index document with extra fields (that are not defined in mapping)

~ $ curl -XPUT localhost:9200/dynamicmappingtest/friends/1 -d '

{
  "id": 0,
  "name": "Maldonado Morrison",
  "tags": {
    "id": 0,
    "name": "Peters Pennington"
  }

} ' {"_index":"dynamicmappingtest","_type":"friends","_id":"1","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"created":true}

~ $ curl -H 'Content-Type: application/json' -XPUT localhost:9200/dynamicmappingtest/friends/2 -d '

{
  "id": 0,
  "name": "Maldonado Morrison",
  "tags": {
    "id": 0,
    "name": "Peters Pennington"
  },
  "somethingelse": {
    "id": 0,
    "name": "Peters Pennington"
  }

} ' {"error":{"root_cause":[{"type":"mapper_exception","reason":"Unmapped field [somethingelse]"}],"type":"mapper_exception","reason":"Unmapped field [somethingelse]"},"status":500}

 **Same test case on Elasticsearch**

curl -XPUT -H 'Content-Type: application/json' localhost:9200/dynamicmappingtest -d ' {

"mappings": { "default": { "_all" : { "enabled": false } }, "friends": { "dynamic": false, "properties": { "id": {"type" : "keyword" }, "name": {"type" : "keyword" }, "tags": { "properties": { "id": {"type" : "integer" }, "name": {"type" : "keyword" } } } } }

} }' {"acknowledged":true,"shards_acknowledged":true,"index":"dynamicmappingtest"}

curl -H 'Content-Type: application/json' -XPUT localhost:9200/dynamicmappingtest/friends/1 -d '

{
"friends": {
  "id": 0,
  "name": "Maldonado Morrison",
  "tags": {
    "id": 0,
    "name": "Peters Pennington"
  }
}

} ' {"_index":"dynamicmappingtest","_type":"friends","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

curl -H 'Content-Type: application/json' -XPUT localhost:9200/dynamicmappingtest/friends/2 -d '

{
  "id": 0,
  "name": "Maldonado Morrison",
  "tags": {
    "id": 0,
    "name": "Peters Pennington"
  },
  "somethingelse": {
    "id": 0,
    "name": "Peters Pennington"
  }

} ' {"_index":"dynamicmappingtest","_type":"friends","_id":"2","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

vroyer commented 6 years ago

Yes, in elassandra, data are persisted in cassandra (not more in elasticsearch) and indexing a document containing a field not mapped to cassandra will cause a data loss, this is the reason why an exception is thrown.

If you really need such feature, it could be improve to accept such request when _source is enable, because in this case, the full JSON document would be also stored in a dedicated cassandra column (this of course introduce an overhead).