valeriansaliou / sonic

🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.
https://crates.io/crates/sonic-server
Mozilla Public License 2.0
20.11k stars 578 forks source link

Locale inferring is not quite accurate #322

Open Catty2014 opened 4 months ago

Catty2014 commented 4 months ago

With ingestion and queries below, sonic behaves incorrectly. I think locale inferring is not quite accurate.(Only Chinese(cmn) and Japanese(jpn) are tested here.) Ingestion commands:

START ingest SecretPassword
PUSH music name music1 "塞壬唱片-MSR DJ Okawari Stephanie - Your Star" LANG(cmn)
PUSH music name music2 "遥そら - 恋ひ恋ふ縁" LANG(jpn)
PUSH music name music3 "warma - 泛泛人类不会祈祷 人声本家" LANG(cmn)
PUSH music name music4 "心华 - 一人行者" LANG(cmn)
PUSH music name music5 "warma - 【翻唱】朝汐" LANG(cmn)
QUIT

Queries with response:

CONNECTED <sonic-server v1.4.9>
START search SecretPassword
STARTED search protocol(1) buffer(20000)
QUERY music name "warma 人类"
PENDING Mk6ornLf
EVENT QUERY Mk6ornLf 
QUERY music name "人类 warma"
PENDING aCsyd85d
EVENT QUERY aCsyd85d 
QUERY music name "ひ恋ふ そら"
PENDING 7g4XLmlD
EVENT QUERY 7g4XLmlD 
QUIT
ENDED quit

Correct behavior with explicit locale:

CONNECTED <sonic-server v1.4.9>
START search SecretPassword
STARTED search protocol(1) buffer(20000)
QUERY music name "warma 人类" LANG(cmn)
PENDING 7hxC95s3
EVENT QUERY 7hxC95s3 music3
QUERY music name "人类 warma" LANG(cmn)
PENDING 9jGco7ss
EVENT QUERY 9jGco7ss music3
QUERY music name "ひ恋ふ そら" LANG(jpn)
PENDING kv4ZPURE
EVENT QUERY kv4ZPURE music2
QUIT
ENDED quit

sonic version: v1.4.9 docker