Closed seifip closed 10 years ago
All I could find on Google is the tokenizer from http://www.amfproject.org/wiki/index.php?n=Programming.LunrJS but I can't make it work.
More generally, are there plans to support more languages than just english?
https://github.com/olivernn/lunr.js/pull/96
我通过间接的方式实现对中文的检索操作。 我实际的用户使用手册程序就是这个做法,你可以试试看。 1,对文章的内容通过lucence进行分析得到关键字 2,使用修改过的lunr.js,制作lunr的中文索引文件 3,查询的时候,多个关键字可以通过“ ”空格分隔,比如“巴西 比赛” 4,使用utf-8编码。
I through the indirect way of the implementation of Chinese retrieval operation.
My actual user manual procedure is this, you can have a try.
1, for the content is obtained by the Lucence keyword
2, the use of modified lunr.js, Chinese index file lunr
3, when the query, multiple keywords can " " spaces,eg "巴西 比赛"
4, UTF-8 encode
There is a lunr-languages project which includes language adapters for lunr. The following languages are currently supported:
The right place for #96 is on lunr-languages.
I don't think there is anything specific to allow for multiple languages in a single index. As a start you'd have to modify the stop word filters to include stop words for each language you want to support, the tokeniser might also need to be modified.
I use a nodejs module to process chinese content,instead of the default one,repo is here: https://github.com/codepiano/lunr.js
@codepiano 's solution work for me, thx.
@mzlogin Glad to hear that.
Is there a way to make Lunr.js work with Chinese and Japanese text? (intermixed with English)