skttl / umbraco-fulltextsearch8

Full Text indexing and searching for Umbraco 8 and Examine.
MIT License
19 stars 22 forks source link

Searchresults inaccurate with french characters #76

Closed Sven883 closed 2 years ago

Sven883 commented 2 years ago

Hello!

We've been using the fulltextsearch package for a while now. Recently while testing the searchresults, we noticed that french queries (in our specific case) are less accurate.

Example query: 'pompe à chaleur' with multiRelevance and wildcardEnabled. it seems that the search functionality also returns all pages with the character 'à' in it.

Should we filter the 'à' out of the searchQuery before performing the actual search with fullTextSearch package? Or should we tweak the settings (multiRelevance, wildcardEnabled, ...)?

Thanks in advance! Sven

skttl commented 2 years ago

Hi Sven

I think it has to do with the default Lucene implementation in Examine expecting the language to be english, and à is not a stop word in english.

There is an article about languages and lucene here, https://skrift.io/issues/creating-a-custom-language-analyzer-for-umbraco-8/, but it's not something I know much about, sorry.