Closed kiegel closed 6 years ago
To illustrate my point about efficient processing, here is a simple SPARQL query to list OCLC numbers from a set of BIBFRAME triples.
SELECT $oclcnumber
WHERE {
?s a bf:Local ;
rdf:value $oclcnumber ;
bf:source $sourceBnode .
$sourceBnode rdfs:label $sourceCode
FILTER ($sourceCode IN("OCoLC", "OCLC", "oclc"))
}
LIMIT 25
Since OCLC numbers are mixed with other identifiers in bf:Local, you have to query bf:source. Furthermore, even though the LC converter consistently uses the same label for source, you can't assume that BIBFRAME data in the wild will be this consistent. In fact, you can't anticipate all variants and typos, so this approach is inherently not robust.
With a subclass defined for OCLC number, the query is much simpler and quite robust.
SELECT ?oclcnumber
WHERE {
?s a bf:OclcNumber ;
rdf:value $oclcnumber .
}
LIMIT 25
Thanks for the sparql - I've passed this proposal on to be considered in the next release of the ontology. If it makes it into a specification, we'll update the converter.
We're also planning a separate release of ShEx shapes which would enable you to validate RDF conforms to what we are expecting. Thanks! Kirk
OCLC numbers have a unique place as an identifier in the library bibliographic universe. It is essential to support easy machine processing of them in a linked data environment. The current situation, where they are tagged with the bf:Local subclass is not ideal. An alternative is to create a new subclass of bf:Identifier, something like bf:OclcNumber, which will allow faster processing.