ualbertalib / metadata

UAL metadata team's repository
14 stars 6 forks source link

OAI-ORE for harvesting from ERA #97

Closed sfarnel closed 6 years ago

sfarnel commented 7 years ago

Investigate OAI-ORE specifications for metadata + object harvesting to understand metadata implications.

See https://www.openarchives.org/ore/

sfarnel commented 7 years ago

Looks might be helpful: http://ai2-s2-pdfs.s3.amazonaws.com/231f/3f36d916555b16a617d627b3450702bbe94d.pdf

And this: https://groups.google.com/forum/#!topic/oai-ore/vRhnTCQiVpM

And this: http://www.greynet.org/images/GL14-S1P,_Bardi_et_al.pdf

johnhuck commented 7 years ago

Here is an ALA tech report on OAI-ORE. It's a decent general introduction that supplements the primer on the openarchives website. I also have a paper copy if ppl find the occasional random font variations too distracting in the e-book.

zschoenb commented 7 years ago

I will add this presentation and this code4lib article. The presentation presents a higher level overview of mapping ORE to community/collections/items in Dspace. The code4lib article, under 'Fedora export plug-in' and 'Fedora import plug-in' headings describes instructions for parsing the Fedora export RDF with ORE, then reingesting it. They use the RDFLib in python for the parsing, which is what I used to parse the Convocation Hall into RDF. Its an interesting workflow. And... here is a plugin for generating ORE ReMs from a Fedora 3 database (looking for something up to date)

sfarnel commented 7 years ago

Details from @zschoenb investigations: Details: The most common way of making OAI-ORE discoverable is by exposing it through OAI-PMH. This is achieved by essentially pointing the PMH record at the ORE resource map. At least one institution (Texas Digital Library) uses ORE to aggregate resources and exposes those aggregations through PMH. The mechanisms for exposing ORE directly are varied and appear to be overlapping. They include: 1) HTTP 303 redirection (with or without content negotiation) 2) Encoding into XHTML using RDFa 3) Embedding HTML link elements or using custom HTTP response headers 4) Using a “hash URI” 5) Exposure via Atom xml 6) Exposure via a “splash page” or Sitemap (possibly through ResourceSync?) IMPORTANT: We have not identified a live example of OAI-ORE dissemination in a repository environment through a mechanism other than OAI-PMH. Exposing OAI-ORE directly requires LAC to identify which of the above models it prefers (i.e. Atom xml vs RDFa, etc.) We can move forward from there. Resources: OAI-ORE through OAI-PMH at Texas Digital Library: https://smartech.gatech.edu/bitstream/handle/1853/28448/83-378-1-PB.pdf and http://ai2-s2-pdfs.s3.amazonaws.com/231f/3f36d916555b16a617d627b3450702bbe94d.pdf OAI-ORE dissemination: Witt, Michael (2010). “Object Reuse and Exchange (OAI-ORE)” Library Technology Reports 46:4 ORE 1.0 spec www.openarchives.org/ore/1.0/discovery. ORE 1.0 spec http://www.openarchives.org/ore/1.0/http#WithMicroformats

sfarnel commented 7 years ago

Further info from LAC: Texas Digital Library is the best working example of the architecture we would be working towards. We have been able to harvest ORE from the University of Regina. Their “Data Provider” Can be found here: http://ourspace.uregina.ca/oai/request?verb=ListMetadataFormats It looks tobe using an Atom-serialized ORE ReM for dissemination (and harvesting).

Example from UofR:

http://hdl.handle.net/10294/7/ore.xml 2008-02-11T22:05:40Z 2008-02-11T22:05:40Z oURspace Policy and practice in education : Vol. 11, No. 1/2 2008-02-11T22:05:40Z LICENSE ORIGINAL TEXT Questions we've posed: - will you harvest the ETDMS record for the full metadata record, and then harvest the ORE record in order to retrieve the bitstreams? (i.e., the absence of bibliographic metadata in this ORE record is not a concern) - do you require handles, or could we use DOIs in the various atom identifier fields? - in the example http://hdl.handle.net/10294/7/ore.xml doesn't resolve (though http://hdl.handle.net/10294/7 does, as you'd expect). Does this matter? - do you have suggestions for non-DSpace-based values for rdf:resource? Or do you require the DSpace ones? - we don't have separate TEXT bitstreams. Would LICENSE and ORIGINAL be sufficient? @danydvd can you spend some time looking further into the exposure through ATOM xml option?
danydvd commented 7 years ago

Here is a summery of what I have found so far on exposure of ORE through Atom

ORE through Atom