grosjo / fts-xapian

Dovecot FTS plugin based on Xapian
GNU Lesser General Public License v2.1
91 stars 19 forks source link

Korean search problem #167

Open lovedownload opened 1 month ago

lovedownload commented 1 month ago

When searching in Korean, the Korean text is broken and the search does not work properly.

The version I'm currently using is version 1.5.5 and it has the following code:

icu::UnicodeString h2 = icu::UnicodeString::fromUTF8(icu::StringPiece(h)); icu::UnicodeString t2 = icu::UnicodeString::fromUTF8(icu::StringPiece(t))

In the updated version 1.7.14, the above codes appear to have disappeared and are not processed properly.

What could be the cause?

grosjo commented 1 month ago

Can you give an example of your search of Korean (which header or full text, etc...)

grosjo commented 1 month ago

can you set verbose=1 in your config and check the request in the logs ?

grosjo commented 1 month ago

And please use latest git

lovedownload commented 1 month ago

Hello.

The test was conducted with the latest version, and the search results are not normal because the Korean text is broken regardless of the title or body. When I checked the log after setting verbose=1, the search keyword was not displayed normally and appeared broken.

The version I used before was 1.5.5, and there was no problem with that version.

While checking what the problem was, I found the following code that was in version 1.5.5 but not in the latest version.

icu::UnicodeString h2 = icu::UnicodeString::fromUTF8(icu::StringPiece(h)); icu::UnicodeString t2 = icu::UnicodeString::fromUTF8(icu::StringPiece(t))

There was no code for Unicode processing in the latest version.

grosjo commented 1 month ago

ICU (Unicode) is used since long time ago, not sure what you refer to by "There was no code for Unicode processing in the latest version."

Can you please provide the logs as asked ?

Thank tou so much