robert7 / nixnote2

Nixnote - Evernote desktop client for Linux
GNU General Public License v3.0
297 stars 32 forks source link

search function is not right for chinese #132

Closed zhao414 closed 5 years ago

zhao414 commented 5 years ago

Expected vs. actual behavior

Searching with Chinese note title sometime does not show the correct result. For example, a Chinese note title contains 3 Chinese characters, looks like 'ABC' , when I search with "AB" or "ABC", it works. Using "BC", when 'B' and 'C' does't composite a Chinese word, i.e.'BC' is not a chines word, the search does not work. However, when 'BC' is a Chinese word, the search is successful. FYI, Chinese word is composited of one or multiple(usually 2 or 3) Chinese characters.

Steps to reproduce the problem

case1:

  1. create a note, with the title in Chinese, such as: 你好世界
  2. sync and reindex (and sync again and reindex again)
  3. searching with 你好 or 你好世界 will return the note created
  4. searching with 好 or 好世界 or 世界 will not return the note created.

FYI: 你好 is a Chinese word (means hello) , 世界 is also a Chinese word (means world).

case2:

  1. create a note, with the title in Chinese and containing a English colon mark, such as: 你好:世界
  2. sync and reindex (and sync again and reindex again)
  3. searching with 你好 or 你好:世界 will return the note created
  4. searching with 你好 or 你好: or :世界 or 世界 will return the note created
  5. searching with 好 or 好:世 or 好:世界 or 界 will not return the note created.
robert7 commented 5 years ago

This is because the current implementation divides text in words and it seems to not work for your language. So it would need some different "stemming algorithm".

Unfortunately with the (little) time I have for this project, this is completely out of scope for me personally to refactor.

It would work, if you put a space between the words... you could do it at least for the titles. This could be kind of workaround.

Or find someone, who understand your language and would be willing to implement new stemming. Pull request are always welcome.