strapdata / elassandra

Elassandra = Elasticsearch + Apache Cassandra
http://www.elassandra.io
Apache License 2.0
1.72k stars 200 forks source link

Elassandra mappings with type coercing aren't behaving the same as elasticsearch #323

Closed epsniff closed 4 years ago

epsniff commented 4 years ago

Elassandra version: 6.8.4.0 JVM version (java -version): docker image OS version (uname -a if on a Unix-like system): host system is OSx but I also tested it in GKE.

Description of the problem including expected versus actual behavior: When using a mapping template, that enables type coercing, Elassandra doesn't behave the same as ElasticSearch. Instead it returns a 500 and an exception.

Steps to reproduce:

In both of these examples we'll be inserting a document that has an integer mapping. And in both cases the int field's value will be submitted as a string value.

Testing Elasticsearch 6.8

docker pull docker.elastic.co/elasticsearch/elasticsearch:6.8.6
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.8.6
export HOST=localhost
export INDEX=ents_type_site1
echo '{
  "index_patterns": "ents_*",
  "mappings": {
    "_doc": {
      "properties": {
        "_created": {
          "type": "date"
        }
      },
      "dynamic_templates": [
        {
          "integer_fields": {
            "match": "int_*",
            "mapping": {
              "coerce": true,
              "type": "long"
            }
          }
        }
      ]
    }
  }
}' | http PUT http://$HOST:9200/_template/ent_table_template
{
    "acknowledged": true
}

echo '{"int__eventcnt":"1"}' |   http POST http://$HOST:9200/$INDEX/_doc/a1bc=:1cd==
{
    "_id": "a1bc=:1cd==",
    "_index": "ents_type_site1",
    "_primary_term": 1,
    "_seq_no": 0,
    "_shards": {
        "failed": 0,
        "successful": 1,
        "total": 2
    },
    "_type": "_doc",
    "_version": 1,
    "result": "created"
}

Testing elassandra 6.8

docker run  -p 9200:9200 -p 9300:9300 --name node0 -d strapdata/elassandra:6.8.4.0  
export INDEX=ents_type_site1
echo '{
  "index_patterns": "ents_*",
  "mappings": {
    "_doc": {
      "properties": {
        "_created": {
          "type": "date"
        }
      },
      "dynamic_templates": [
        {
          "integer_fields": {
            "match": "int_*",
            "mapping": {
              "coerce": true,
              "type": "long"
            }
          }
        }
      ]
    }
  }
}' | http PUT http://$HOST:9200/_template/ent_table_template
{
    "acknowledged": true
}

echo '{"int__eventcnt":"1"}' |   http POST http://$HOST:9200/$INDEX/_doc/a1bc=:1cd==
{
    "error": {
        "reason": "java.lang.String cannot be cast to java.lang.Number",
        "root_cause": [
            {
                "reason": "[172.17.0.2][172.17.0.2:9300][indices:data/write/bulk[s][p]]",
                "type": "remote_transport_exception"
            }
        ],
        "type": "class_cast_exception"
    },
    "status": 500
}

Please provide the following information:

vroyer commented 4 years ago

Elassandra is not Elasticsearch, so don't force the discovery to single-node and follow the Elassandra quick-start guide .

epsniff commented 4 years ago

@vroyer what? This has nothing to do with discovery... This about index mappings and string coercing, i.e. Elassandra can't handle coercing the term "100" into 100.

See:

"reason": "java.lang.String cannot be cast to java.lang.Number",
epsniff commented 4 years ago

I only used docker in the reproduction steps to keep things simple. And the same exception happens when you use the _bulk api.

For our evaluation we are testing Elassandra in Kubernetes, which I setup using Google's Application Market place.

The reason we use coercing is that when converting int64's to float64's for JSON, some precision is lose. So, for large numbers we marshal the int64 as strings, and then let Elasticsearch coerce them back into number types.

epsniff commented 4 years ago

@vroyer I think I figured out your confusion. In the reproduction steps I had steps for starting up both an ElasticSearch node and steps for starting up a Elassandra node. The command "discovery.type=single-node" was only used in the ElasticSearch example, please continue reading... The important part is the results of inserting a single document into both ElasticSearch vs Elassandra.

vroyer commented 4 years ago

Sorry for closing the ticker, i thought you were trying to run elassandra with the single-node discovery.

Effectively, quoted numbers are not interpreted as number by elassandra, while it is properly interpreted by elasticsearch. Without quotes, elassandra works fine:

put "$INDEX/_doc/a1bc=:1cd==?pretty" '{"int__eventcnt":1}' { "_index" : "ents_type_site1", "_type" : "_doc", "_id" : "a1bc=:1cd==", "_version" : 1, "result" : "created", "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 } }

We are going to fix that quickly. Thanks for reporting the issue.

leleueri commented 4 years ago

Hi,

The Elassandra 6.8.4.1 has just been released and fix your issue :


:/# curl -X PUT -H "Content-Type: application/json" "http://localhost:9200/_template/ent_table_template" -d '{
>   "index_patterns": "ents_*",
>   "mappings": {
>     "_doc": {
>       "properties": {
>         "_created": {
>           "type": "date"
>         }
>       },
>       "dynamic_templates": [
>         {
>           "integer_fields": {
>             "match": "int_*",
>             "mapping": {
>               "coerce": true,
>               "type": "long"
>             }
>           }
>         }
>       ]
>     }
>   }
> }'

:/# curl -X POST  -H "Content-Type: application/json" "http://localhost:9200/ents_type_site1/_doc/a1bc=:1cd==" -d '{"int__eventcnt":"1"}' 
{"_index":"ents_type_site1","_type":"_doc","_id":"a1bc=:1cd==","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}

:/# curl -X POST  -H "Content-Type: application/json" "http://localhost:9200/ents_type_site1/_search?pretty" 
{
  "took" : 60,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "ents_type_site1",
        "_type" : "_doc",
        "_id" : "a1bc=:1cd==",
        "_score" : 1.0,
        "_source" : {
          "int__eventcnt" : 1
        }
      }
    ]
  }
}

The docker image is available on DockerHub.