idn-au / catalogue-data

Creative Commons Attribution 4.0 International
0 stars 1 forks source link

Harvest data from ANU OAI-PMH #21

Closed lalewis1 closed 2 months ago

lalewis1 commented 3 months ago

https://openresearch-repository.anu.edu.au/oai/request?verb=ListSets

Start with the ANU Thesis sets and then anything else with ANU in the title.

It is a Dspace server.

lalewis1 commented 2 months ago

data can be downloaded in batches per set as rdf using the following syntax:

https://openresearch-repository.anu.edu.au/oai/request?verb=ListRecords&resumptionToken=rdf///com_1885_1/0

where results are given in batches of 100 starting from record 0 (specified at the end of the url), and com_1885_1 is the set identifier (sets are like collections of records, i.e. ANU Research set)

lalewis1 commented 2 months ago

data harvested for all sets that start with ANU. Initially the data has been converted with the same RDF mappings as given by the oai-pmh server. and most things mapped as literals.

mappings to be improved.

lalewis1 commented 2 months ago

mapped the dc:type literals to appropriate SDO.CreativeWork subclasses. bringing the data in line with the extracts from aries and the thesis dump.

lalewis1 commented 2 months ago

all pushed to graphdb.