lcnetdev / marc2bibframe2

Convert MARC records to BIBFRAME2 RDF
http://www.loc.gov/bibframe/
Creative Commons Zero v1.0 Universal
88 stars 35 forks source link

standardize treatment of URIs #36

Closed jodiw01 closed 7 years ago

jodiw01 commented 7 years ago

From work on conversion of MARC tag 041, Wayne reports that the conversion program treats URIs in three different ways:

From what I can see, with this we are handling URIs as values in three different ways in the conversion.

  1. Create a bf:identifiedBy property with a bf:Identifier object that has an untyped rdf:value property of the URI ($0 or $w)
  2. Create an additional rdfs:label property with datatype xs:anyURI directly on the appropriate object (5XX $u)
  3. Create an untyped rdf:value directly on the appropriate object (041) Can we come up with a consistent way of representing this? To me, the clearest thing would be to use the bf:identifiedBy property, like so (using the 041 example above): bf:language [ a bf:Language; bf:part "original"; bf:identifiedBy [ a bf:Identifier; rdf:value "http://id.loc.gov/vocabulary/languages/eng"^^xs:anyURI ] ] . And the conversion spec would change to W - language - Language - identifiedby - Identifier - rdf:value "URI"^^xs:anyURI The spec for the $0/$w would change to add the datatype to the rdf:value, and the specs for the 5XX $u would change to add the identifiedBy property and to use rdf:value instead of rdfs:label as the property holding the URI. If adding the identifiedBy property is one layer of indirection too many (well, actually, two layers), or I'm misusing the property, I'm OK with just using rdf:value -- but I do think we should be consistent, and change the specs for $0/$w and 5XX $u to use rdf:value with the xs:anyURI datatype, and maybe remove the identifiedBy property created by the $0/$w (just put the rdf:value directly on the object in question). ---

Consistent treatment would be good

wafschneider commented 7 years ago

FWIW, as implemented in 58de81de212a7d2c0597314c85aba1c3c90c0d6f, the conversion is actually:

<Work>
  bf:language [
    a bf:Language;
    bf:part "original";
    bf:identifiedBy [
      a bf:Identifier;
      rdf:value <http://id.loc.gov/vocabulary/languages/eng>
    ]
  ] .

And the conversions for the 041 and for URIs in $0/$w have been aligned to match this pattern. Conversions for 5XX $u have not been changed.

kirkhess commented 7 years ago

The discussion is related to #7 and #10 - we'll work on the specs and docs to reflect this pattern.