innoq / iqvoc

iQvoc - A SKOS(-XL) Vocabulary Management System for the Semantic Web
http://iqvoc.net/
Other
117 stars 44 forks source link

Problems to order Special characteres (AÂÃâã) #355

Open carlamartinsab opened 8 years ago

carlamartinsab commented 8 years ago

In portuguese idiom there are words that start with Á (Água or água) and  (Ângulo or ângulo). IQVOC does not understand that both words stars with A and should appear in the alphabetical ordered view in the beginning, with the others words starting with A. These words appear in the end, after the Z. Maybe it happens also in german, that has words with "¨". How can I solve this?

Thank you Carla.

mjansing commented 8 years ago

Hi @carlamartinsab,

yes this looks like a bug. I tested this behaviour with a german thesaurus. Alphabetical concept listing also shows links for concepts starting with german umlauts (e.g. Ä, Ö, etc.) but these listing are always empty even if there are concept starting with such a character are available.

one more thing: Can you test iQvoc's search for concepts starting with one of your special characters? In my case search couldn't find any concepts even if there are some.

carlamartinsab commented 8 years ago

Hi @mjansing

Exactly, when searching by words starting with "A", the word "Água" and "Ângulo" are not returned. The same occurs when searching for "Á" and it not returns "A" neither "á". This behaviour is not acceptable for português idiom users, as there are many terms with accentuation.

Thanks Carla

mjansing commented 8 years ago

@carlamartinsab thank you for investigation. That's also a Problem in german. I'll try to fix that issue soon.

carlamartinsab commented 8 years ago

@mjansing OK! Thank you in advance!

mjansing commented 8 years ago

Should be fixed with 426ce2c. Feel free to repoen if there any issues.

We'll release a new iqvoc version to rubygems soon.

carlamartinsab commented 8 years ago

We've applied the fix available for the issue ##, but it didn't work as we expected. Apparently ordering does not change and the "Á" "Â""Ó" starting words appear in the end of the alphabethical list. And when searching for these words I cannot find anymore...

Below, we present some quick examples for illustrate what is happening. Concepts preferred labels: Amostra de área Área de contato Área de influência Balanceamento

1

Seached Term: "Área" Current Results: "Amostra de área" Expected Results: "Amostra de área", "Área de contato", "Área de influência"

2

Seached Term: "área" Current Results: "Amostra de área" Expected Results: "Amostra de área", "Área de contato", "Área de influência"

3

Seached Term: "area" Current Results: Expected Results: "Amostra de área", "Área de contato", "Área de influência"

We are also facing some problems in sorting mechanism.

Current result: Amostra de área Balanceamento Área de contato Área de influência

Expected result: Amostra de área Área de contato Área de influência Balanceamento

Thanks for your help

mjansing commented 8 years ago

I'll take a look at it.

rbvictor commented 8 years ago

Hi @mjansing, I was looking at the issue #351 and an idea occured to me. Is it possible to not only ignore the case, but also ignore the accents (special characters) during search by removing them from both sides of the comparison in self.by_query_value(query) of base.rb .

Something like where(["UNACCENT(LOWER(#{table_name}.value)) LIKE ?", I18n.transliterate(query.mb_chars.downcase.to_s)]) ?

The problem is that the function UNACCENT is specific to PostgreSQL databases. I don't know how to adapt it for other DBs. However, do you think it is possible to do something like this?

mjansing commented 8 years ago

Hmm I'm not sure if I fully understand this issue. Which "view" is affected by this issue (hierarchical concepts or alphabetical concepts?

With current version of iQvoc and some sample data it looks like this:

hierarchical concepts

screen shot 2016-06-23 at 09 23 53

alphabetical concepts

screen shot 2016-06-23 at 09 23 48

carlamartinsab commented 8 years ago

Hello @mjansing,

@rbvictor seems to be right. When searching, if we have two terms like "Água Tratada" and "Agua não Tratada" and I search for "Agua", the first term is not returned. I would see only "Agua não Tratada", because "A" and "Á" are treated as different alphabet's letters.

So, if the system just ignore the accents (and of course the case), it seems to solve the problem.

Thank you

rbvictor commented 8 years ago

I think there are 2 issues in 1:

  1. Search functionality does not ignore accents in latin characters: For example, when we search for the correct word "água", we have the expected results, but when we search for "agua" without the acute accent, there is no result. I think both "água" and "agua" are supposed to be regarded the same way and have the same results, similarly to other search tools in general. Maybe the same thing may happen in German with characters like "ä", "ö", "ü" etc.
  2. Alphabetical sort does not ignore accents in latin characters, either: Currently I am using the version before fix 426ce2c from this issue. Labels beginning with these special characters appear in the end of the list. This problem happens throughout the pages where there is this sorting functionality:
    • alphabetical concepts
    • hierarchical concepts
    • order in search results
    • etc.

Do you think it would be better to address these two issues separately?

carlamartinsab commented 8 years ago

I agree with that. And the (1)Search functionality is more critical.