o19s / quepid

Improve your Elasticsearch, OpenSearch, Solr, Vectara, Algolia and Custom Search search quality.
http://www.quepid.com
Apache License 2.0
284 stars 101 forks source link

"Illegal mix of collations" error when using CSV Static Endpoint for queries with emojis #1046

Open shuttie opened 3 months ago

shuttie commented 3 months ago

Describe the bug When I create a CSV Static File endpoint with a CSV file having queries with UTF surrogate pairs (e.g. emojis), import fails:

[75432bc3-6079-4423-8949-020bdc6d7abb] Completed 500 Internal Server Error in 15ms (ActiveRecord: 7.5ms | Allocations: 2639)
[75432bc3-6079-4423-8949-020bdc6d7abb]   
[75432bc3-6079-4423-8949-020bdc6d7abb] ActiveRecord::StatementInvalid (Mysql2::Error: Illegal mix of collations (utf8mb3_general_ci,IMPLICIT) and (utf8mb4_bin,COERCIBLE) for operation '='):

A sample file reproducing the issue: kfc.csv Where the query is kfc πŸŸβž•πŸ”βž•πŸ— - yes I also wonder who searches like that, but here we are.

To Reproduce Steps to reproduce the behavior on v7.17.1:

  1. Go to Relevance Cases > Create case
  2. Click on 'CSV Static Endpoint' and upload sample file (or any other with surrogate pairs)
  3. Click 'import',
  4. UI hangs, and in console there's Mysql2::Error: Illegal mix of collations error.

Expected behavior Expected the UI and import not to break on such weird queries.