Closed rybesh closed 4 years ago
This is what was causing the Turtle serialization to fail: a URL value with some spaces at the end. I'm going to replace rdflib with a subprocess call to riot to avoid this in the future, but we should also be avoiding putting bad URLs in the dataset. Or maybe we should change the type of these values to a string?
This affects ISO year values too: whitespace at the beginning or end of the value, or whitespace between the minus sign and the digits for years before 0, will cause validation and CSV generation to fail.
I can confirm that this is a problem I have observed in the wild with the Chronique des Fouilles periodization -- we had multiple spatial values that were actually the same as a result of a trailing whitespace. I didn't catch them until after I'd merged the patch because they weren't visible when I reviewed the submission.
fixed in master
Sometimes people editing the
sameAs
orurl
fields of period definitions enter whitespace at the end of the URL. We should be stripping this, as we are ending up with invalid URLs in our data, e.g.: