An application for indexing and displaying IIIF Manifests from multiple partners in an effort to create a high quality digital version of Count Leopoldo Cicognara's private book collection.
git clone https://github.com/pulibrary/cicognara-rails.git
cd cicognara-rails
bundle install
lando start
bundle exec rake db:setup
Remember you'll need to run bundle install
on an ongoing basis as dependencies are updated.
lando start
RAILS_ENV=test bundle exec rake db:setup
bundle exec rspec
Install Lando:
lando start
bundle exec rake cico:development:clean_and_seed
bin/rails s
bin/rails c
> u = User.new
> u.email = me@example.com
> u.role = "admin"
Look at app/controllers/application_controller.rb
to see how this user is
always logged in on a development site. You can comment out the current_user
method there if you want to see a non-admin view.
cd tmp
git clone git@github.com:pulibrary/cicognara-catalogo.git
cd ..
TEIPATH=tmp/cicognara-catalogo/catalogo.tei.xml MARCPATH=tmp/cicognara-catalogo/cicognara.mrx.xml bundle exec rake tei:index
TEIPATH=tmp/cicognara-catalogo/catalogo.tei.xml bundle exec rake tei:partials
bundle exec rake getty:import
To create a tagged release use the steps in the RDSS handbook
When deploying, make sure the desired cicognara-catalogo (MARC and TEI) release is specified:
# config/deploy.rb`
set :default_env,
'MARCPATH' => 'public/cicognara.mrx.xml',
'TEIPATH' => 'public/catalogo.tei.xml',
'CATALOGO_VERSION' => 'v2.1'
After deploying, please invoke the following in order to reindex from the latest release of the Catalogo, and rebuild the partials:
cap [STAGE] deploy:reindex
Another way to reindex after a deployment is to SSH to the machine and execute the following rake tasks. This is what we did in March/2022. Notice that we had to manually get the Getty files via cURL before running getty:import
because download process fails intermittently (however, once the files are on disk the rake task will use them and complete successfully).
cd /opt/cicognara/current
# Get latest source data files
TEIPATH=public/cicognara.tei.xml MARCPATH=public/cicognara.mrx.xml bundle exec rake tei:catalogo:update
# Run TEI Index (about 5 minutes)
TEIPATH=public/cicognara.tei.xml MARCPATH=public/cicognara.mrx.xml bundle exec rake tei:index
# Regenerate the partials
TEIPATH=public/cicognara.tei.xml MARCPATH=public/cicognara.mrx.xml bundle exec rake tei:partials
# (optional) Get the Getty files via cURL when the site is acting up and requires retries.
# Find the latest set at http://portal.getty.edu/resources/json_data/resourcedump.xml
cd tmp
curl -OL http://portal.getty.edu/resources/json_data/resourcedump_2022-10-31-part1.zip
curl -OL http://portal.getty.edu/resources/json_data/resourcedump_2022-10-31-part2.zip
curl -OL http://portal.getty.edu/resources/json_data/resourcedump_2022-10-31-part3.zip
curl -OL http://portal.getty.edu/resources/json_data/resourcedump_2022-10-31-part4.zip
cd ..
# Run the Getty import (1 hr)
TEIPATH=public/cicognara.tei.xml MARCPATH=public/cicognara.mrx.xml bundle exec rake getty:import
# Re-index the TEI
TEIPATH=public/cicognara.tei.xml MARCPATH=public/cicognara.mrx.xml bundle exec rake tei:index
If you need to make someone an admin on a production box, ensure they've logged in once, then run the set_admin_role
task for their email address:
EMAIL=user@example.org bundle exec rake set_admin_role
To make changes to the Solr in production/staging you need to update the files in the pul_solr repository and deploy them. The basic steps are:
cap solr8-staging deploy
.You can see the list of Capistrano environments here
The deploy will update the configuration for all Solr collections in the given environment, but it does not cause downtime. If you need to manually reload a configuration for a given Solr collection you can do it via the Solr Admin UI.
The environment values used for integration with Google Authentication (GOOGLE_CLIENT_ID
and GOOGLE_CLIENT_SECRET
) are defined via the console.cloud.google.com
. You can use this page to reset the Client ID
and Client secret
values. If you don't have access to this page please contact Esmé.
There are three main sources of information for this project:
The process to index TEI and MARC files (i.e. rake tei:index
) ingests the data into a single Solr collection, but it creates
separate documents for each source. For example there are two Solr documents for alt_id:"dcl:nvx"
, one of these documents
(the one with format:marc
) has the MARC data whereas the other one has the TEI data. Records from TEI (i.e. -format:marc
)
are what is searched for when a user submits a search.
This part of the import process also creates records in the Books
and Entries
tables to represent some of this data.
There is another process that fetches and processes the data from the Getty (i.e. rake getty:import
). This process downloads
files from Getty, unzips them into about 50,000 JSON files, and finds records that are associated with the "Cicognara Collection".
There are about 9,600 records that meet this criteria. For each one of them it process the manifest_urls
indicated in the Getty
record and creates a Version
record to store the metadata for each different manifest.
This part of the process is slow-ish since, for each record, it contacts each different institution indicated in the manifest_url
to fetch the data (example 1
and example 2). This data is saved in the Version
table.
Notice that the data stored in the Version
table is displayed to the user but it is not indexed in Solr.
Check out the following documents for additional information on the data and how it is modeled: