requarks / wiki-v1

Legacy version (1.x) of Wiki.js
https://wiki.js.org
GNU Affero General Public License v3.0
100 stars 75 forks source link

Search doesn't work with cyrillic #10

Open xTNTx opened 7 years ago

xTNTx commented 7 years ago

Actual behavior

Search always return empty result for cyrillic keywords

Expected behavior

It should work the same as for latin text/keywords

Steps to reproduce the behavior

bennycode commented 7 years ago

Suggestion: Using a char map for transliteration (like this one: https://github.com/pid/speakingurl/blob/v13.0.0/lib/speakingurl.js#L8) might be of help here! 😃

NGPixel commented 7 years ago

@bennyn Thanks! That should be useful.

bennycode commented 7 years ago

I just wrote some demo code to showcase the usage of "speakingurl":

const speakingurl = require("speakingurl");
const directory = ['Παναγιώτα', 'Элизабет', 'Вин'];

function transliterate(string = '', query = '') {
  return string.toLowerCase().includes(query.toLowerCase());
}

function search(query) {
  return directory.filter((entry) => transliterate(speakingurl(entry), speakingurl(query)));
}

const query = 'Panagiota';
const results = search(query);

console.log(results.join(',')); // "Παναγιώτα"

You can test it here: https://npm.runkit.com/speakingurl

xTNTx commented 7 years ago

Transliteration might decrease quality of the search. Search-index demo shows it correctly works with non-latin words, so issue is somewhere in the middle.

So far I found cyrillic text is filtered out by this regexp so it's not added to the index. However fixing this didn't make search working.