skohub-io / skohub-vocabs

A lightweight tool to publish SKOS Vocabularies
https://skohub.io/
Apache License 2.0
34 stars 25 forks source link

Umlauts are not indexed properly #164

Closed acka47 closed 1 year ago

acka47 commented 2 years ago
  1. Umlauts at the beginning of a string are ignored. For example go to https://w3id.org/kim/hochschulfaechersystematik/scheme and type in "Über", this is what you get: image
  2. Although we do not use a forward tokenizer yet (see #153), words that do not start with but contain an Umlaut are indexed ad two words, resulting e.g. in something like this when typing "ühgesch" or "hgesch" at https://w3id.org/kim/hochschulfaechersystematik/scheme: image
acka47 commented 2 years ago

186 is now deployed on test and I let this build anew: https://test.skohub.io/acka47/testing-skohub-vocabs/heads/master/index.de.html

Umlauts now don't work in a different way than before:

image

Theay are apparantly not ignored anymore but transformed to their corresponding vowel:

image

sroertgen commented 2 years ago

As mentioned in https://github.com/skohub-io/skohub-vocabs/pull/197#issue-1326736011 Uppercase umlaute should now work.

Theay are apparantly not ignored anymore but transformed to their corresponding vowel:

This is still the case and requires some adjustments in either App.js or the nestedList.js component. So #197 will not close this issue. But it will bring some improvement.

acka47 commented 2 years ago

Functional review for #197:

As @sroertgen says, this is an improvement as upper case Umlauts now also work (I noticed that my "Über" example in https://github.com/skohub-io/skohub-vocabs/issues/164#issuecomment-1202724985 isn't a good one as there eists no "Über" in the Hochschulfächerklassifikation): image

The change from #197 can be deployed to production, I will open a PR against the master. Otherwise I agree with @sroertgen that this will need some more work to close the issue by handling Umlaute correctly.

acka47 commented 2 years ago

Reopening as #197 doesn't completely fix the problem (see above two comments by me and @sroertgen ).