Open dazza-codes opened 7 years ago
one of the subtopics of reconciliation
related to #64, #59
This SPARQL might help to identify duplicate work URIs, using the identifiedBy values:
PREFIX bf: <http://id.loc.gov/ontologies/bibframe/>
SELECT ?idValue (COUNT(?w) as ?workCount)
WHERE {
?w a bf:Work;
bf:adminMetadata ?amd .
?amd bf:identifiedBy ?id .
?id rdf:value ?idValue .
}
GROUP BY ?idValue
ORDER BY DESC(?workCount)
LIMIT 100
Running this on the Casalini data did not identify any duplicate works. This is confirmed by this SPARQL because all the result counts for works were 1 and only 1:
PREFIX bf: <http://id.loc.gov/ontologies/bibframe/>
SELECT ?i (COUNT(?w) as ?workCount)
WHERE {
?i bf:instanceOf ?w .
}
GROUP BY ?i
ORDER BY DESC(?workCount)
LIMIT 100
When a MARC record contains data from multiple fields that the converter creates instances from, all the instances are linked to the same work; e.g.
<http://ld4p-test.stanford.edu/11347283#Work>
bf:hasInstance <http://ld4p-test.stanford.edu/11347283#Instance>
bf:hasInstance <http://ld4p-test.stanford.edu/11347283#Instance856-28>
bf:hasInstance <http://ld4p-test.stanford.edu/11347283#Instance856-29>
This is an example of finding an OCLC number from the 035 field:
SELECT ?id ?p ?o ?sp ?so
WHERE {
<http://ld4p-test.stanford.edu/11347283#Instance> bf:identifiedBy ?id .
?id ?p ?o ;
bf:source ?s .
?s ?sp ?so .
}
SELECT ?id ?idValue ?idSourceLabel
WHERE {
<http://ld4p-test.stanford.edu/11347283#Instance> bf:identifiedBy ?id .
?id rdf:value ?idValue ;
bf:source ?idSource .
?idSource rdfs:label ?idSourceLabel .
}
We can get RDF from OCLC using this identifier, e.g.
$ curl -i http://www.worldcat.org/oclc/911267839.rdf
HTTP/1.1 307 Temporary Redirect
Date: Thu, 13 Apr 2017 22:20:18 GMT
Server: Apache
Location: http://experiment.worldcat.org/oclc/911267839.rdf
Content-Length: 0
P3P: CP="OCLC"
Content-Type: text/plain