periodo / periodo-client

Client to browse and edit PeriodO data
https://client.perio.do
Other
15 stars 2 forks source link

Strip extraneous whitespace from manually entered data #155

Closed rybesh closed 4 years ago

rybesh commented 6 years ago

Sometimes people editing the sameAs or url fields of period definitions enter whitespace at the end of the URL. We should be stripping this, as we are ending up with invalid URLs in our data, e.g.:

Bad IRI: <https://opencontext.org/types/B7B0F27F-EE67-4AA3-ADDC-6C0B06A9C11F > Spaces are not legal in URIs/IRIs.
rybesh commented 5 years ago

This is what was causing the Turtle serialization to fail: a URL value with some spaces at the end. I'm going to replace rdflib with a subprocess call to riot to avoid this in the future, but we should also be avoiding putting bad URLs in the dataset. Or maybe we should change the type of these values to a string?

rybesh commented 4 years ago

This affects ISO year values too: whitespace at the beginning or end of the value, or whitespace between the minus sign and the digits for years before 0, will cause validation and CSV generation to fail.

atomrab commented 4 years ago

I can confirm that this is a problem I have observed in the wild with the Chronique des Fouilles periodization -- we had multiple spatial values that were actually the same as a result of a trailing whitespace. I didn't catch them until after I'd merged the patch because they weren't visible when I reviewed the submission.

rybesh commented 4 years ago

fixed in master