lcnetdev / marc2bibframe2

Convert MARC records to BIBFRAME2 RDF
http://www.loc.gov/bibframe/
Creative Commons Zero v1.0 Universal
89 stars 35 forks source link

Identifier subclass for OCLC number #67

Closed kiegel closed 6 years ago

kiegel commented 6 years ago

OCLC numbers have a unique place as an identifier in the library bibliographic universe. It is essential to support easy machine processing of them in a linked data environment. The current situation, where they are tagged with the bf:Local subclass is not ideal. An alternative is to create a new subclass of bf:Identifier, something like bf:OclcNumber, which will allow faster processing.

kiegel commented 6 years ago

To illustrate my point about efficient processing, here is a simple SPARQL query to list OCLC numbers from a set of BIBFRAME triples.

SELECT   $oclcnumber
WHERE {
  ?s a bf:Local ;
     rdf:value $oclcnumber ;
     bf:source $sourceBnode .
  $sourceBnode rdfs:label $sourceCode
  FILTER ($sourceCode IN("OCoLC", "OCLC", "oclc"))
}
LIMIT 25

Since OCLC numbers are mixed with other identifiers in bf:Local, you have to query bf:source. Furthermore, even though the LC converter consistently uses the same label for source, you can't assume that BIBFRAME data in the wild will be this consistent. In fact, you can't anticipate all variants and typos, so this approach is inherently not robust.

With a subclass defined for OCLC number, the query is much simpler and quite robust.

SELECT  ?oclcnumber
WHERE {
  ?s a bf:OclcNumber ;
     rdf:value $oclcnumber .
}
LIMIT 25
kirkhess commented 6 years ago

Thanks for the sparql - I've passed this proposal on to be considered in the next release of the ontology. If it makes it into a specification, we'll update the converter.

We're also planning a separate release of ShEx shapes which would enable you to validate RDF conforms to what we are expecting. Thanks! Kirk