Open jenniferward opened 1 month ago
Some characters need to be normalized (smart quotes vs. apostrophes) but some need to be allowed (u vs ü). Currently, ’ and ' are read as different punctuation marks. This causes misalignment in city names in Institutions: La Seu d’Urgell https://rism.online/institutions/30079707 La Seu d'Urgell https://rism.online/institutions/30005481 and duplicates in Titles/Texts: https://muscat.rism.info/admin/standard_titles?utf8=%E2%9C%93&q%5Btitle_equals%5D=Au+sein+des+alarmes+l%E2%80%99amour+a+des+charmes&commit=Filter&order=id_desc Au sein des alarmes l’amour a des charmes Au sein des alarmes l'amour a des charmes
’
'
This arises especially when copying from websites or data imports. The problem has been solved with searching (see https://github.com/rism-digital/muscat/issues/622 ) but not on the input side.
I can think of the following:
" "
“ ”
-
–
—
For the dashes, only one is needed (the dash I think?) in the standardized fields.
What about spaces? Sometimes that acts strangely (Excel doesn't always read the spaces as spaces) but I can't describe it further than that.
This is most important the fields that are linked to authority files, not everywhere (like in notes fields).
Some characters need to be normalized (smart quotes vs. apostrophes) but some need to be allowed (u vs ü). Currently,
’
and'
are read as different punctuation marks. This causes misalignment in city names in Institutions: La Seu d’Urgell https://rism.online/institutions/30079707 La Seu d'Urgell https://rism.online/institutions/30005481 and duplicates in Titles/Texts: https://muscat.rism.info/admin/standard_titles?utf8=%E2%9C%93&q%5Btitle_equals%5D=Au+sein+des+alarmes+l%E2%80%99amour+a+des+charmes&commit=Filter&order=id_desc Au sein des alarmes l’amour a des charmes Au sein des alarmes l'amour a des charmesThis arises especially when copying from websites or data imports. The problem has been solved with searching (see https://github.com/rism-digital/muscat/issues/622 ) but not on the input side.
I can think of the following:
’
and'
" "
and“ ”
-
–
—
(dash, n-dash, m-dash)For the dashes, only one is needed (the dash I think?) in the standardized fields.
What about spaces? Sometimes that acts strangely (Excel doesn't always read the spaces as spaces) but I can't describe it further than that.
This is most important the fields that are linked to authority files, not everywhere (like in notes fields).