Closed antonsar closed 7 years ago
Hi JPrante,
Please let me know if you need further details. I appreciate it if you could take a look at this issue.
Essentially what I did is following the langdetect example from the README and it was not returning the correct result. The only factor that are different in my environment is I have 2 additional plugins (smart_cn, and kuromoji plugins).
Thanks in advance!
For langdetect
in 5.1.1.0, you have to explicitly declare all languages you want to be detected, like this
PUT /test
{
"mappings": {
"article": {
"properties": {
"content": {
"type": "langdetect",
"languages" : [ "en", "de", "fr" ]
}
}
}
}
}
Bundle 5.1.1.0 is a preview release, not official.
ICU and hyphen is working and documented, all other analyzers are not reviewed and not well documented. The docs and examples are out of sync. Bugs and changes are to be expected. They will be fixed and documented in future versions.
Thank you so much for the clarification!!!
Thanks again
Hello,
I am trying this plugin out to handle document with mixed languages. Unfortunately the type "langdetect" is causing some issue for me.
Here are some info that maybe useful: ES version 5.1.1 This bundle plugin version 5.1.1.0 smart_cn analysis plugin - latest kuromoji analysis plugin - latest
Then I did this (following the example):
curl -XDELETE 'localhost:9200/test' curl -XPUT 'localhost:9200/test' curl -XPOST 'localhost:9200/test/article/_mapping' -d ' { "article" : { "properties" : { "content" : { "type" : "langdetect" } } } } ' curl -XPUT 'localhost:9200/test/article/1' -d ' { "title" : "Some title", "content" : "Oh, say can you see by the dawn
s early light, What so proudly we hailed at the twilight
s last gleaming?" } 'Finally I did the search after calling refresh
curl -XPOST 'localhost:9200/test/_search' -d ' { "query" : { "term" : { "content" : "en" } } } ' However the search above returns 0 hit.
I double check the mapping and "content" now showing like this:
curl -XGET "localhost:9200/test/_mappings?pretty"
"content" : { "type" : "langdetect", "analyzer" : "_keyword", "include_in_all" : false }
calling curl -XGET 'localhost:9200/test/_search' shows this
"_source" : { "content" : "Oh, say can you see by the dawn
s early light, What so proudly we hailed at the twilight
s last gleaming?" }Based off the examples and the result I was getting, I don't think this is the intended behavior. How should I retrieve the detected language code ?
Thank You!