Open jenniferward opened 2 months ago
Additionally, we have about 9,000 duplicates...
Maybe this is happening in all the non-MARCy areas?
https://muscat-test.rism.info/admin/digital_objects/22934/edit
https://muscat-test.rism.info/admin/liturgical_feasts/50001171
It could be I think they are only stripped in marc, I imagine this is stuff people copy and paste around?
Yes, 'people' have been known to copy things from anywhere, even from Muscat!
I'm looking at this, it seems that the auth files do not strip the input data. I think we need to do it in two steps:
I'm looking at this, it seems that the auth files do not strip the input data. I think we need to do it in two steps:
If my experience helps with cases like this one, I think that it is better to first avoid new cases and then fix the old errors. Because otherwise, there is always the chance a newer one pops up after the correction and before the fix is applied. But of course, you know your workflow better.
Normally I would do the same! But in this case I don't want to make a fix that automatically strips whitespace on save, and then have problems when editors save old records that might collide with new ones (triggering the unique constraints) if the record is saved again. In any case fixing the data and updating the system will happen at upgrade time, when the system is offline, so there is no risk of these kind of problems. It is mostly to say that this problem will not be fixed in 11 :)
Whitespaces are (conveniently) stripped from Sources at the start or end of a field (see https://github.com/rism-digital/muscat/issues/1409) but apparently they are retained elsewhere in Muscat. I've noticed it in the Titles. This interferes with the alphabetizing and can also inadvertently lead to duplicates.
Standard titles sorted alphabetically: The ones with whitespaces are at the top. https://muscat-test.rism.info/admin/standard_titles?clear_filters=true&order=title_asc
How the first one looks in Edit mode: https://muscat-test.rism.info/admin/standard_titles/50206615/edit
We ended up with
12 Sonatas
with a whitespace at the beginning: https://muscat-test.rism.info/admin/standard_titles/5046265but also the correct
12 Sonatas
https://muscat-test.rism.info/admin/standard_titles/3911582 The one with the space at the beginning is showing up in Sources (not stripped, even after saving): https://muscat-test.rism.info/admin/sources/1001056785/edit