Closed romanchyla closed 9 years ago
Originally on 2011-01-24
1) When I search for physik fur ingenieure
and for physik f\"ur ingenieure
on CDS, I seem to be getting the same 14 hits. Can you
please make the example more concrete in order to see which record
was supposed to be found but it was not? Kind of like recid:124 AND physik
that should have found record 124 but it did not, for
example. In any case, I do not seem to be able to reproduce this
problem.
2) Bug reports concerning CDS only, i.e. the CERN instance of Invenio only, are better submitted to the dedicated CERN Savannah support tracker at [[https://savannah.cern.ch/support/?group=cdsware]].
Originally on 2011-01-25
Oups, sorry, my bad - i checked again and the queries are fine, besides this version (which the german user used as first):
Physik fuer Ingenieure and recid:112675
The other two are fine. Though interestingly, there are two groups
Physik f\"ur --> 974 hits Physik fur --> 974 hits Physik fuer --> 510 hits
ps: thank you for the link
Originally on 2011-02-04
OK, so the problem is that this record contains the wordfür, and that it can be well found viafur, but not viafuer.
This is actually how CDS behaves by design: many years ago in a common discussion with the library on how to index accented letters it was decided to simply strip Latin-1 accents. Hencefür is indexed asfur only, andfuer does not find it.
I agree that we may want to alter this behaviour...
Originally by arwagner on 2013-11-26
I may add that there is another "accented character" fo this type entirely missing in the list. Namely the ß ## ss sz in German language (except Switzerland who abandoned this char I think.)
Especially in names it would be great if one could capture this as well.
@kaplun has there been any progress with custom tokenizers for bibindex
supporting mentioned characters?
I'm not sure whether multiple alternative transliterations for a term are possible with recent INSPIRE improvements... "für -> fur, fuer" is the main issue at hand here. In theory, people could write their own tokenisers to achieve this. In practice, we can muse whether Invenio should do this by default...
There is no PR for this issue, hence closing it as per the legacy
code base freeze; it is addressed in master
code base differently.
Originally on 2011-01-24
In CDS, searching for:
physik fuer ingenieure physik fur ingenieure Physik für Ingenieure
works not, what works is this:
physik f\"ur ingenieure
which is very wierd.