strapdata / elassandra

Elassandra = Elasticsearch + Apache Cassandra
http://www.elassandra.io
Apache License 2.0
1.71k stars 198 forks source link

ES Discover doesn't map 'List' column correctly #330

Closed jasonever closed 4 years ago

jasonever commented 4 years ago

Hello,

I need your help in the below inquires:

1) I've created Cassandra table:

        CREATE TABLE ip.ipv (
            ip_from text,
            ip_to text,
            version int,
            last_update list<bigint>
            PRIMARY KEY (ip_from, ip_to, version)
        ) WITH CLUSTERING ORDER BY (ip_to ASC, version ASC);
        CREATE CUSTOM INDEX elastic_ipv_idx ON ip.ipv () USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';

Now when I try to index the table to ES using:

PUT /ip
{
  "settings": {
      "keyspace":"ip",
  },
  "mappings": {
      "ipv" : { "discover" : ".*" }
  }
}

Now when I check the mapping for the table on Elasticsearch , I see that last_update has a wrong type long while it should be list. How can I fix this ?

{
  "ip" : {
    "aliases" : { },
    "mappings" : {
      "ipv" : {
        "properties" : {
          "ip_from" : {
            "type" : "keyword",
            "cql_collection" : "singleton",
            "cql_partition_key" : true,
            "cql_primary_key_order" : 0
          },
          "ip_to" : {
            "type" : "keyword",
            "cql_collection" : "singleton",
            "cql_primary_key_order" : 1
          },
          "last_update" : {
            "type" : "long"
          },

2) When I try to run aggregation query using CQLSH , I get the below error (although of it's working on Kibana):

cqlsh> select * from ip.ipv where es_query='{"aggs":{"version":{"stats":{"field":"version"}}}}' and es_options='indices=ip.ipv' ALLOW FILTERING;
ServerError: java.lang.IllegalArgumentException: unsupported aggregation type=[stats] name=[version]

3) Regarding Virtual Index and Partitioned Index , I'll use the first row of this column last_update which would contain TimeInMillis values with partition_function , My try is something similar to that :

curl -XPUT -H 'Content-Type: application/json' "http://localhost:9200/ip.ipv_2020" -d '{
  "settings": {
      "keyspace":"ip",
      "index.partition_function":"toYearIndex ip.ipv_{0,date,yyyy} last_update.0",
      "index.partition_function_class":"MessageFormatPartitionFunction",
      "index.virtual_index":"ip.ipv"
  },
  "mappings": {
      "ipv" : { "discover" : ".*" }
  }
}'

Is this correct ? , is last_update.0 correct ? Will the function toYearIndex handle time in millis ? or only date ? or there is another function instead?

Thanks,

leleueri commented 4 years ago

Hi,

Here is the answer for your 3 points :

  1. the default type for cql_collection is list, so the output is correct.

  2. you have to ask a json output in the es_options to execute this query in cql :

    
    cqlsh> select * from ip.ipv where es_query='{"aggs":{"version":{"stats":{"field":"version"}}}}' and es_options='indices=ip;json=true' ALLOW FILTERING;
    
    version
    -----------------------------------------------------
    {"count":3,"min":1.0,"max":3.0,"avg":2.0,"sum":6.0}

(1 rows)


3. Currently, list are not managed as partition_function parameter. You have to use a column with a simple type (text, integer, long...) not a collection.
jasonever commented 4 years ago

Thanks @leleueri , Just to confirm my understanding:

1- You mean I'll get always only one long value from ES although of the field type in Cassandra is defined as List<long> ?

leleueri commented 4 years ago

Hi,

If you have a list of long in cassandra, you will also have list of long in ES. By default, ES index cassandra column using a list of given column type and in this case le cql_collection isn't displayed when the mapping is displayed because by default, the cql_collection is set to list.

In this example, I create list of long and long column, the ES mapping display singleton for the singleLong column and only long for listOfLong column.

create table test.table_ex(id bigint primary key, listOfLong list<bigint>, signelLong bigint);

curl -XPUT -H 'Content-Type: application/json' "http://localhost:9200/test/" -d '{
  "settings": {
      "keyspace":"test" 
  },
  "mappings": {
      "table_ex" : { "discover" : ".*" }
  }
}'
{"acknowledged":true,"shards_acknowledged":true,"index":"test"}

curl -XGET -H 'Content-Type: application/json' "http://localhost:9200/test/?pretty" 
{
  "test" : {
    "aliases" : { },
    "mappings" : {
      "table_ex" : {
        "properties" : {
          "id" : {
            "type" : "long",
            "cql_collection" : "singleton",
            "cql_partition_key" : true,
            "cql_primary_key_order" : 0
          },
          "listoflong" : {
            "type" : "long"
          },
          "signellong" : {
            "type" : "long",
            "cql_collection" : "singleton"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "keyspace" : "test",
        "number_of_shards" : "1",
        "provided_name" : "test",
        "creation_date" : "1581926250928",
        "number_of_replicas" : "0",
        "uuid" : "u2bRVgRZSEm1QU_T_UWZUg",
        "version" : {
          "created" : "6080499"
        }
      }
    }
  }
}
vroyer commented 4 years ago

ES fields are list by default in Cassandra. ES Long = Cassandra bigint It works as expected !