valeriansaliou / sonic

🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.
https://crates.io/crates/sonic-server
Mozilla Public License 2.0
20.11k stars 578 forks source link

Add Lindera tokenizer support for Japanese #311

Closed nmkj-io closed 1 year ago

nmkj-io commented 1 year ago

This PR adds support of Lindera tokenizer for better search results in Japanese. This uses unidic since it is newer than ipadic and could provide consistent search results [1].

[1] https://www.anlp.jp/proceedings/annual_meeting/2016/pdf_dir/D6-5.pdf

supakaity commented 1 year ago

Thanks! This will hopefully help improve some of the translations of our Japanese posts.

valeriansaliou commented 1 year ago

Thank you so much! Will issue a Sonic release now.