Chinese and Japanese support

olivernn / lunr.js

A bit like Solr, but much smaller and not as bright

http://lunrjs.com

MIT License

8.97k stars 547 forks source link

Chinese and Japanese support #91

Closed seifip closed 10 years ago

seifip commented 10 years ago

Is there a way to make Lunr.js work with Chinese and Japanese text? (intermixed with English)

seifip commented 10 years ago

All I could find on Google is the tokenizer from http://www.amfproject.org/wiki/index.php?n=Programming.LunrJS but I can't make it work.

mauricesvay commented 10 years ago

More generally, are there plans to support more languages than just english?

ming300 commented 10 years ago

https://github.com/olivernn/lunr.js/pull/96

我通过间接的方式实现对中文的检索操作。我实际的用户使用手册程序就是这个做法，你可以试试看。 1，对文章的内容通过lucence进行分析得到关键字 2，使用修改过的lunr.js，制作lunr的中文索引文件 3，查询的时候，多个关键字可以通过“ ”空格分隔,比如“巴西比赛” 4，使用utf-8编码。

I through the indirect way of the implementation of Chinese retrieval operation.

My actual user manual procedure is this, you can have a try.

1, for the content is obtained by the Lucence keyword

2, the use of modified lunr.js, Chinese index file lunr

3, when the query, multiple keywords can " " spaces,eg "巴西比赛"

4, UTF-8 encode

olivernn commented 10 years ago

There is a lunr-languages project which includes language adapters for lunr. The following languages are currently supported:

German
French
Spanish
Italian
Dutch
Danish
Portuguese
Finnish
Romanian
Hungarian
Russian
Norwegian

The right place for #96 is on lunr-languages.

I don't think there is anything specific to allow for multiple languages in a single index. As a start you'd have to modify the stop word filters to include stop words for each language you want to support, the tokeniser might also need to be modified.

codepiano commented 10 years ago

I use a nodejs module to process chinese content,instead of the default one,repo is here: https://github.com/codepiano/lunr.js

mzlogin commented 8 years ago

@codepiano 's solution work for me, thx.

codepiano commented 8 years ago

@mzlogin Glad to hear that.