infinite-dao / glean-cetaf-rdfs

Collect and glean RDF data in parallel of stable identifiers of the Consortium of European Taxonomic Facilities (CETAF) and prepare them for import into a SPARQL endpoint
GNU General Public License v3.0
0 stars 0 forks source link

General ~ Format RDF the right way (data types, properties) #12

Open infinite-dao opened 1 year ago

infinite-dao commented 1 year ago

Hej-hej!

This is just an example I stumpled over:

Here the original of the RDF (from 24.11.2022) of http://data.biodiversitydata.nl/naturalis/specimen/U.1257357:

<rdf:RDF
    xmlns:dc="http://purl.org/dc/terms/"
    xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

<rdf:Description rdf:about="http://data.biodiversitydata.nl/naturalis/specimen/U.1257357">
    <dc:title>Jatropha curcas L.</dc:title>
    <dwc:family>Euphorbiaceae</dwc:family>
    <dwc:recordedBy>Andel, TR van; Roberts, N; Ford, G; George, N</dwc:recordedBy>
    <dwc:fieldNumber>Andel, TR van; Roberts, N; Ford, G; George, N  1279 </dwc:fieldNumber>
    <dwc:decimalLatitude>7.75</dwc:decimalLatitude>
    <dwc:decimalLongitude>-59.5</dwc:decimalLongitude>
    <dwc:associatedMedia>https://medialib.naturalis.nl/file/id/U.1257357/format/large</dwc:associatedMedia>
</rdf:Description>

</rdf:RDF>

During importing we do some formatting by adding data types, e.g. <dwc:decimalLatitude …> (it is not perfect yet):

<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
    xmlns:dwciri="http://rs.tdwg.org/dwc/iri/" > 
  <rdf:Description rdf:about="http://data.biodiversitydata.nl/naturalis/specimen/U.1257357">
    <dwc:institutionID rdf:resource="https://ror.org/0566bfb96"/><!-- added by import: URI formatted -->
    <!-- added by import: set data type decimal -->
    <dwc:decimalLatitude rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">7.75</dwc:decimalLatitude>
    <dwc:decimalLongitude rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">-59.5</dwc:decimalLongitude>
    <!-- 
      dwc:associatedMedia here is a string data type, perhaps it should be rdf:resource="…" 
      <dwc:associatedMedia rdf:resource="https://medialib.naturalis.nl/file/id/U.1257357/format/large"/>
    -->
    <dwc:associatedMedia>https://medialib.naturalis.nl/file/id/U.1257357/format/large</dwc:associatedMedia>
    <dwc:fieldNumber>Andel, TR van; Roberts, N; Ford, G; George, N  1279 </dwc:fieldNumber>
    <dwc:recordedBy>Andel, TR van; Roberts, N; Ford, G; George, N</dwc:recordedBy>
    <dwc:family>Euphorbiaceae</dwc:family>
    <dcterms:title>Jatropha curcas L.</dcterms:title>
  </rdf:Description>
</rdf:RDF>