jaeksoft / opensearchserver

Open-source Enterprise Grade Search Engine Software
http://www.opensearchserver.com
Apache License 2.0
499 stars 190 forks source link

Automatic Language detection #1825

Open Mojster opened 8 years ago

Mojster commented 8 years ago

Hi,

I've put the URL [(http://www.sicris.si/public/jqm/memo.aspx?lang=slv&opdescr=faq&source=evaluation.inc&opt=3&subopt=7)] into Manual crawl. Automatic language detection stated: Lang: cs It should be sl - Slovenian and not Czech.

Mojster commented 8 years ago

Found in your FAQ an article: How the lang attribute of webpages gets detected

So the fallback with content detection is not working properly. We'll try to solve this with language params on our test site and see how this works out.

Mojster commented 7 years ago

One option is to put language param in HTML documents. So than it detects SL. But in results it returns English as first result.

I think I could solve this with language param in query. But it does not contain Slovenian.

Is there a possibility to add this?

Mojster commented 6 years ago

Let me turn my question around. Would you add Slovene language to your "ngram detection"?

In issue #1822 you gave me once instructions how to add a slovene lemantizer. This I did. But how can I use it and if I'm right, this is not connected with the "ngram detection"?