silknow / converter

SILKNOW converter that harmonizes all museum metadata records into the common SILKNOW ontology model (based on CIDOC-CRM)
Apache License 2.0
1 stars 0 forks source link

Imatex: Original dataset has Strings/Entries that contain mixed languages #24

Open tschleider opened 5 years ago

tschleider commented 5 years ago

(Originally part of #22 )

The original Dataset is not consisent. The original language seems to be Catalan, but for some fields in the English version there is either still Catalan text or even a mix between Catalan and English. E.g. first object when looking for textile pieces in the English version on their website ( Reg. nr. 108), here an example:

TÈCNICA* | brocaded - / brocading weft - / trama llançada - / patterned fabric -

http://imatex.cdmt.cat/_cat/pubindex.aspx

rtroncy commented 3 years ago

After reading more carefully #22 and this issue, it seems to me that the values of some fields for IMATEX are improperly language tagged. More precisely, we cannot rely exclusively on the extension of the JSON file being _ca, _es or _en to be sure that the strings are written respectively in Catalan, Spanish and English. Worst, I understand that sometimes some values are mixed languages. But all values are represented in the KG, right? If this is the remaining issue, then I'm not sure it is worth to address it. We could always rely on a language detection library to verify the language of the content, @ehrhart has done this in the past.