rism-digital / muscat

🗂️ A Rails application for the inventory of handwritten and printed music scores
http://muscat-project.org
34 stars 16 forks source link

Whitespaces in authorites are retained in Standard Titles #1575

Open jenniferward opened 2 months ago

jenniferward commented 2 months ago

Whitespaces are (conveniently) stripped from Sources at the start or end of a field (see https://github.com/rism-digital/muscat/issues/1409) but apparently they are retained elsewhere in Muscat. I've noticed it in the Titles. This interferes with the alphabetizing and can also inadvertently lead to duplicates.

Standard titles sorted alphabetically: The ones with whitespaces are at the top. https://muscat-test.rism.info/admin/standard_titles?clear_filters=true&order=title_asc image

How the first one looks in Edit mode: https://muscat-test.rism.info/admin/standard_titles/50206615/edit image

We ended up with 12 Sonatas with a whitespace at the beginning: https://muscat-test.rism.info/admin/standard_titles/5046265 image

but also the correct 12 Sonatas https://muscat-test.rism.info/admin/standard_titles/3911582 The one with the space at the beginning is showing up in Sources (not stripped, even after saving): https://muscat-test.rism.info/admin/sources/1001056785/edit image

lpugin commented 2 months ago

Additionally, we have about 9,000 duplicates...

jenniferward commented 2 months ago

Maybe this is happening in all the non-MARCy areas?

https://muscat-test.rism.info/admin/digital_objects/22934/edit image

https://muscat-test.rism.info/admin/liturgical_feasts/50001171 image

image

https://muscat-test.rism.info/admin/standard_terms/50001820 image

image

xhero commented 2 months ago

It could be I think they are only stripped in marc, I imagine this is stuff people copy and paste around?

jenniferward commented 2 months ago

Yes, 'people' have been known to copy things from anywhere, even from Muscat!

xhero commented 1 month ago

I'm looking at this, it seems that the auth files do not strip the input data. I think we need to do it in two steps:

fjorba commented 1 month ago

I'm looking at this, it seems that the auth files do not strip the input data. I think we need to do it in two steps:

If my experience helps with cases like this one, I think that it is better to first avoid new cases and then fix the old errors. Because otherwise, there is always the chance a newer one pops up after the correction and before the fix is applied. But of course, you know your workflow better.

xhero commented 1 month ago

Normally I would do the same! But in this case I don't want to make a fix that automatically strips whitespace on save, and then have problems when editors save old records that might collide with new ones (triggering the unique constraints) if the record is saved again. In any case fixing the data and updating the system will happen at upgrade time, when the system is offline, so there is no risk of these kind of problems. It is mostly to say that this problem will not be fixed in 11 :)