ualbertalib / jupiter

Jupiter is a University of Alberta Libraries-based initiative to create a sustainable and extensible digital asset management system. This is phase 2 (Digitization).
https://era.library.ualberta.ca/
MIT License
23 stars 10 forks source link

Recover Solr data from Fedora #410

Closed weiweishi closed 6 years ago

weiweishi commented 6 years ago

We will need to figure out a way to reindex Solr from Fedora data in time of emergency recovery of data, if solr cloud redundancy and backups fail.

mbarnett commented 6 years ago

Notes for when it comes time to deal with this:

We can steal the basic sketch of how to do this from https://github.com/samvera/active_fedora/blob/7e9c365c00ced6ce4175096a3ff7b423cc72bf64/lib/active_fedora/indexing.rb#L95, although that code won't directly work for us because it doesn't account for the custom indexing and lambdas that come through from LockedLDPObject (it's not obvious to me that this works in raw Sufia either). Also, the "what Fedora objects get reindexed, what subobject graphs don't" code is rather convoluted: https://github.com/samvera/active_fedora/blob/e4685f953d3dcce8357d2e4e58df4092bbb52ea9/lib/active_fedora/indexing/descendant_fetcher.rb

Basic sketch of approach is:

1: Walk Fedora tree, looking for anything with a hasModel 2: Pull it via ActiveFedora::Base.find(id from uri of object from tree) (may need typecasting to concrete class?) (doesn't seem to, now that I've tried it) 3: Instantiate a LockedLDPObject and connect it's @ldp_object instance to the thing we pulled in 2. 4: Either unlock and save or directly write the solr data as in the first example above.

pbinkley commented 6 years ago

Any hope for doing all this from the triplestore and leave Fedora in peace? (that would assume that whatever broke Solr didn't also break the triplestore, of course).

mbarnett commented 6 years ago

I don't think so, at least not as long as we're dependent on ActiveFedora. It would be easier to decouple this stuff post-Fedora

mbarnett commented 6 years ago

Merged as of #809