jprante / elasticsearch-langdetect

A plugin for language detection in Elasticsearch using Nakatani Shuyo's language detector
Apache License 2.0
250 stars 46 forks source link

Can't aggregate ? #70

Open adrianlungu opened 7 years ago

adrianlungu commented 7 years ago

Hello,

I've tried creating an aggregation using all of the examples in the README without any luck yet.

If I try to use the stored langdetect field for aggregating, ES tells me the data needs fielddata: true, however, it does not allow me to enable it on the langdetect field since it is not text.

I have also tried to use the lang subfield mentioned in the README, however, this does not yield any results.

Example:

PUT /test
{
   "mappings": {
      "docs": {
         "properties": {
            "text": {
               "type": "langdetect",
               "languages" : [ "en", "de", "fr" ],
               "store": true
            }
         }
      }
   }
}
PUT /test/docs/1
{
    "text" : "Oh, say can you see by the dawn`s early light, What so proudly we hailed at the twilight`s last gleaming?"
}
GET /test/_search
{
    "query": {
        "bool": {
            "must": [
               {
                   "query_string": {
                      "query": "text:*"
                   }
               }
            ]
        }        
    },
    "aggregations": {
        "language": {
            "terms": {"field": "text.lang"}    
        }
    }
}

Response

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test",
            "_type": "docs",
            "_id": "1",
            "_score": 1,
            "_source": {
               "text": "Oh, say can you see by the dawn`s early light, What so proudly we hailed at the twilight`s last gleaming?"
            }
         }
      ]
   },
   "aggregations": {
      "language": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": []
      }
   }
}

Am I missing something or doing this wrong ?

Thanks!

jprante commented 7 years ago

My langdetect plugin creates a new field type and is not a string/text/keyword, which may be the reason that aggregation does not work.

I will have a look into this issue, maybe the idea of a new field type was over-engineered, and I find a simple way to set it to a straightforward string/text/keyword field type.

adrianlungu commented 7 years ago

Maybe adding the possibility to set fielddata ? Or the ability to add a subfield to the language field in order to also store it as a keyword on which to add the aggregations ?