xelkano / redmine_xapian

This plugin allows searches across attachments with xapian search engine
GNU General Public License v2.0
54 stars 26 forks source link

In search result, chinese description is garbled #111

Closed keineahnung2345 closed 3 years ago

keineahnung2345 commented 3 years ago

When I search the text "判断" in my document, it shows the following result: image

I've found that's because in the following lines:

https://github.com/xelkano/redmine_xapian/blob/fedf924a377dbb89b866f645428ee9aafe9207eb/lib/redmine_xapian/xapian_search.rb#L196-L198

dochash[:sample].encoding is ASCII-8BIT, which should be UTF-8 instead. After I changed dochash[:sample].encode('UTF-8', dochash[:sample].encoding, ...) into dochash[:sample].encode('UTF-8', 'UTF-8', ...), the search result becomes normal:

image

But it's just a workaround, and I don't know why dochash[:sample] is detected as ASCII-8BIT encoding. Is there a better fix? Thanks.

keineahnung2345 commented 3 years ago

I've found that in: https://github.com/xelkano/redmine_xapian/blob/fedf924a377dbb89b866f645428ee9aafe9207eb/lib/redmine_xapian/xapian_search.rb#L77-L79

query_string is UTF-8, but query.description is ASCII-8BIT, but I can't find related information in xapian's website.

picman commented 3 years ago

image