ualbertalib / HydraNorth

This repo is deprecated. Succeeded by https://github.com/ualbertalib/jupiter. This codebase was a IR built based on Samvera/Sufia
11 stars 4 forks source link

Spike: modify metadata via Fedora API #948

Closed pbinkley closed 8 years ago

pbinkley commented 8 years ago

Test the efficiency of updating metadata fields on specific objects via the Fedora API, compared to rake tasks or manual jobs in the Rails console, based on examples in Ruth Tillman, "Extracting, Augmenting, and Updating Metadata in Fedora 3 and 4 Using a Local OpenRefine Reconciliation Service" in the latest issue of Code4Lib: http://journal.code4lib.org/articles/11179 .

pbinkley commented 8 years ago

Preliminary results (for a set of 10 records in a dev vm), to delete all creators and add a single creator. The rake task takes almost a minute, the fedora api takes under a second. But the rake task includes the Solr update, which isn't (and can't be) part of the fedora api.

rake task

real    0m50.937s
user    0m33.410s
sys 0m15.458s

sparql-update rest api, one transaction per item

real    0m0.815s
user    0m0.192s
sys 0m0.016s

sparql-update rest api, one transaction includes all items

real    0m0.793s
user    0m0.188s
sys 0m0.012s
pbinkley commented 8 years ago

And a rake task to do the Solr update only:

real    0m37.845s
user    0m26.345s
sys 0m10.024s

So on that basis, fedora api + rake for solr update saves about 24% compared to rake for the whole job. I'll load a larger test set overnight and see how that looks.