Closed crisr15 closed 5 months ago
I used ChatGPT for a first pass at this, and I think it did a pretty good job. I plan on checking the external resources to see if theres anything it missed but here is what it had so far:
Overall, the process for upgrading the Solr version used by the Oral History Rails app will involve reviewing changes in Solr 8.11, updating the Solr configuration files and the solr-ruby gem, and potentially reindexing data with the new version of Solr.
i know we use S3 and I saw this in one section of the upgrade notes, not sure if it will apply to us
Solr 8.10 provides support for storing backups in Amazon S3 buckets. See the section S3BackupRepository for how to configure.
we should most likely reindex after the upgrade, which is hopefully easier than a "reimport" as mentioned in the ticket
found this section in the version 8 major changes notes
It is always strongly recommended that you fully reindex your documents after a major version upgrade.
Solr has a new section of the Reference Guide, Reindexing which covers several strategies for how to reindex.
It’s now possible to overwrite an existing configset when uploading changes by supplying the overwrite=true parameter to the Configset API.
A related parameter is cleanup=true, which allows deleting any files from the old configset that are left behind after the overwrite.
The default for both of these parameters is false.
When deleting a collection that has an automatically created configset (i.e., the configset was copied from the _default collection when the collection was created), the configset will also be deleted if it is not in use by any other collection.
Solr 8.11 major changes: https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-8.html Major changes between 7 & 8: https://solr.apache.org/guide/8_11/major-changes-in-solr-8.html#major-changes-in-earlier-7-x-versions
Related ticket created: https://github.com/scientist-softserv/oral-history/issues/25
solr version will need to be updated in the dockerfile:
@summer-cook Will the wave forms be deleted/need to be recreated when re reindex? Where are those stored?
@crisr15 I didn't find anything in the docs that talked about wav files specifically. They most likely wouldn't need to be deleted, but reindexed along with everything else in the app. They have a specific section in their readme that says that creating the wav files is very time intensive Not sure exactly where the wav files are stored, but I'm pretty sure they use an oai feed. One other thing to note is I spoke to Jeremy about how blacklight is reindexed and here is our convo:
I also found this in the apache solr docs:
Because there is not a built-in blacklight equivalent to reindex like there is in spotlight/hyku, and oral history uses an oai feed, the way to reindex would be the same as however they got the files into solr in the first place, or potentially rerunning the oai feed.
I don't know if something similar to harvard's spotlight_oaipmh
gem might be useful here, or if they already have a process in place for reindexing.
Summary
Look into upgrade paths from solr 7.7 to 8.11. This ticket should explore what needs to be done when updating including if there is any migration, if items will need to be reimported, and whether or not wave files will need to be remade. (Please note, we would prefer not to remake the wave files as they are time consuming).
Acceptance Criteria
Related ticket created: https://github.com/scientist-softserv/oral-history/issues/25