Closed Schaumbaum closed 9 years ago
Do you have documents in index movies
, and a mapping for index movies
, with a field configured with tokenizer umlaut
?
Have used the first example to see the generated tokens, but I have also tried to map it to a field, but not with the expected result.
This is my test mapping (GET http://localhost:9200/movies/movie/_mapping): { "movies": { "mappings": { "movie": { "properties": { "message": { "type": "string", "analyzer": "deutsch" } } } } } }
And these are my settings (GET http://localhost:9200/movies/_settings): { "movies": { "settings": { "index": { "creation_date": "1433778178966", "uuid": "RHlpyXunSOucBIg2vuLaJg", "analysis": { "analyzer": { "deutsch": { "tokenizer": "umlaut" } }, "filter": { "umlaut": { "type": "german_normalize" } }, "tokenizer": { "umlaut": { "type": "standard", "filter": "umlaut" } } }, "number_of_replicas": "1", "number_of_shards": "5", "version": { "created": "1050299" } } } } }
The plugin seems to be installed properly (from GET http://localhost:9200/_nodes) "plugins": [ { "name": "plugin-bundle-1.5.2.0-e6ec36a", "version": "1.5.2.0", "description": "A collection of useful plugins", "jvm": true, "site": false } ]
And my only document (GET http://localhost:9200/movies/movie/1): { "_index": "movies", "_type": "movie", "_id": "1", "_version": 1, "found": true, "_source": { "message": "Ein schöner Tag in Köln im Café an der Straßenecke" } }
I would expect that it would be found if I query for "koln", but it is only found by the search term "köln". Maybe I have missed something?
Try this
DELETE /movies
PUT /movies
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"deutsch": {
"type" : "custom",
"tokenizer" : "standard",
"filter": [
"lowercase",
"german_normalize"
]
}
}
},
"number_of_replicas": "0",
"number_of_shards": "1"
}
}
}
GET /movies/_settings
POST /movies/movies/_mapping
{
"properties": {
"message": {
"type": "string",
"analyzer": "deutsch"
}
}
}
GET /movies/_mapping
PUT /movies/movies/1
{
"message" : "Ein schöner Tag in Köln im Café an der Straßenecke"
}
POST /movies/movies/_search
{
"query": {
"match": {
"message": "koln"
}
}
}
POST /movies/_analyze?analyzer=deutsch
{
"text" : "Die Jahresfeier der Rechtsanwaltskanzleien auf dem Rhein in der Nähe von Köln hat viel Ökosteuer gekostet"
}
As soon as you stop screwing it up, it actually works ;-) I somehow misinterpreted the examples. Thank you very much
I'm trying to setup a new search server with the ability to index german documents. Therefore I discovered your plugin bundle, which seems to cover my needs. Unfortunately I'm not able to integrate the plugin properly. An example how I have tried it: PUT http://huclmaid01:9200/movies { "settings":{ "index":{ "analysis":{ "filter":{ "umlaut":{ "type":"german_normalize" } }, "tokenizer" : { "umlaut" : { "type":"standard", "filter" : "umlaut" } } } } } }
The tokens still contains the umlauts: POST http://huclmaid01:9200/movies/_analyze?tokenizer=umlaut { "text": "Die Jahresfeier der Rechtsanwaltskanzleien auf dem Rhein in der Nähe von Köln hat viel Ökosteuer gekostet" }
What am I doing wrong?