jprante / elasticsearch-plugin-bundle

A bundle of useful Elasticsearch plugins
GNU Affero General Public License v3.0
110 stars 17 forks source link

How to integrate the plugin? #6

Closed Schaumbaum closed 9 years ago

Schaumbaum commented 9 years ago

I'm trying to setup a new search server with the ability to index german documents. Therefore I discovered your plugin bundle, which seems to cover my needs. Unfortunately I'm not able to integrate the plugin properly. An example how I have tried it: PUT http://huclmaid01:9200/movies { "settings":{ "index":{ "analysis":{ "filter":{ "umlaut":{ "type":"german_normalize" } }, "tokenizer" : { "umlaut" : { "type":"standard", "filter" : "umlaut" } } } } } }

The tokens still contains the umlauts: POST http://huclmaid01:9200/movies/_analyze?tokenizer=umlaut { "text": "Die Jahresfeier der Rechtsanwaltskanzleien auf dem Rhein in der Nähe von Köln hat viel Ökosteuer gekostet" }

What am I doing wrong?

jprante commented 9 years ago

Do you have documents in index movies, and a mapping for index movies, with a field configured with tokenizer umlaut?

Schaumbaum commented 9 years ago

Have used the first example to see the generated tokens, but I have also tried to map it to a field, but not with the expected result.

This is my test mapping (GET http://localhost:9200/movies/movie/_mapping): { "movies": { "mappings": { "movie": { "properties": { "message": { "type": "string", "analyzer": "deutsch" } } } } } }

And these are my settings (GET http://localhost:9200/movies/_settings): { "movies": { "settings": { "index": { "creation_date": "1433778178966", "uuid": "RHlpyXunSOucBIg2vuLaJg", "analysis": { "analyzer": { "deutsch": { "tokenizer": "umlaut" } }, "filter": { "umlaut": { "type": "german_normalize" } }, "tokenizer": { "umlaut": { "type": "standard", "filter": "umlaut" } } }, "number_of_replicas": "1", "number_of_shards": "5", "version": { "created": "1050299" } } } } }

The plugin seems to be installed properly (from GET http://localhost:9200/_nodes) "plugins": [ { "name": "plugin-bundle-1.5.2.0-e6ec36a", "version": "1.5.2.0", "description": "A collection of useful plugins", "jvm": true, "site": false } ]

And my only document (GET http://localhost:9200/movies/movie/1): { "_index": "movies", "_type": "movie", "_id": "1", "_version": 1, "found": true, "_source": { "message": "Ein schöner Tag in Köln im Café an der Straßenecke" } }

I would expect that it would be found if I query for "koln", but it is only found by the search term "köln". Maybe I have missed something?

jprante commented 9 years ago

Try this

DELETE /movies

PUT /movies
{
      "settings": {
         "index": {
            "analysis": {
               "analyzer": {
                  "deutsch": {
                      "type" : "custom",
                      "tokenizer" : "standard",
                      "filter": [ 
                          "lowercase",
                          "german_normalize" 
                          ]
                  }
               }
            },
            "number_of_replicas": "0",
            "number_of_shards": "1"
         }
      }
}

GET /movies/_settings

POST /movies/movies/_mapping
{
            "properties": {
               "message": {
                  "type": "string",
                  "analyzer": "deutsch"
               }
    }
}

GET /movies/_mapping

PUT /movies/movies/1
{
    "message" : "Ein schöner Tag in Köln im Café an der Straßenecke"
}

POST /movies/movies/_search
{
    "query": {
        "match": {
           "message": "koln"
        }
    }
}

POST /movies/_analyze?analyzer=deutsch
{
    "text" : "Die Jahresfeier der Rechtsanwaltskanzleien auf dem Rhein in der Nähe von Köln hat viel Ökosteuer gekostet"
}
Schaumbaum commented 9 years ago

As soon as you stop screwing it up, it actually works ;-) I somehow misinterpreted the examples. Thank you very much