pat / thinking-sphinx

Sphinx/Manticore plugin for ActiveRecord/Rails
http://freelancing-gods.com/thinking-sphinx
MIT License
1.63k stars 470 forks source link

Problem with UTF8 #1109

Closed sesaretah closed 6 years ago

sesaretah commented 6 years ago

The following Rails query:

Article.search 'book'

turns into

SELECT * FROM `article_core` WHERE MATCH('book') AND `sphinx_deleted` = 0 LIMIT 0, 20

but

Article.search 'روسیه'

turns into

SELECT `articles`.* FROM `articles` WHERE `articles`.`id` IN (1, 3)

Why this happens?

pat commented 6 years ago

Can you re-check your logs on this? I think the second query you have there (on articles) is an ActiveRecord query, whereas the first (on article_core) is to Sphinx.

pat commented 6 years ago

Also, it's worth noting: each search call results in a query to Sphinx, and then a query to your database to translate the Sphinx results into ActiveRecord objects. Hence why you're seeing both types of queries.

sesaretah commented 6 years ago

I found the problem. Putting ngram_chars in the config file causes this problem. And it was unnecessary for indexing "Arabic, Persian and Urdu". Setting the right charset_table does the trick i.e., charset_table: 0..9, a..z, _, A..Z->a..z, U+621..U+63a, U+640..U+64a,U+66e..U+66f, U+671..U+6d3, U+6d5, U+6e5..U+6e6, U+6ee..U+6ef, U+6fa..U+6fc, U+6ff

pat commented 6 years ago

Thanks for sharing the solution :)