rism-digital / muscat

🗂️ A Rails application for the inventory of handwritten and printed music scores
http://muscat-project.org
34 stars 16 forks source link

Normalize punctuation on input #1599

Open jenniferward opened 1 month ago

jenniferward commented 1 month ago

Some characters need to be normalized (smart quotes vs. apostrophes) but some need to be allowed (u vs ü). Currently, and ' are read as different punctuation marks. This causes misalignment in city names in Institutions: La Seu d’Urgell https://rism.online/institutions/30079707 La Seu d'Urgell https://rism.online/institutions/30005481 and duplicates in Titles/Texts: https://muscat.rism.info/admin/standard_titles?utf8=%E2%9C%93&q%5Btitle_equals%5D=Au+sein+des+alarmes+l%E2%80%99amour+a+des+charmes&commit=Filter&order=id_desc Au sein des alarmes l’amour a des charmes Au sein des alarmes l'amour a des charmes

This arises especially when copying from websites or data imports. The problem has been solved with searching (see https://github.com/rism-digital/muscat/issues/622 ) but not on the input side.

I can think of the following:

For the dashes, only one is needed (the dash I think?) in the standardized fields.

What about spaces? Sometimes that acts strangely (Excel doesn't always read the spaces as spaces) but I can't describe it further than that.

This is most important the fields that are linked to authority files, not everywhere (like in notes fields).