wellcomecollection / alpha

Alpha version of a catalogue explorer for Wellcome Library (deprecated).
0 stars 0 forks source link

Wellcome Library Alpha

Background

Alpha (2015-2020) was a demonstration of how Wellcome Library's library catalogue data could be explored in more browsable, visually interesting ways.

Formerly part of the Wellcome Library website, Alpha built on the work from four prototypes we made earlier - What's In The Library? - and moved it to the Wellcome Library infrastructure.

Our collaborators at Good, Form & Spectacle - @george08 @frankieroberto and @tomstuart- explored our MARC21 bibliographic data and digitised content over 2 phases during 2015/2016. We had a lot of fun working with them!

Even though the sites have been decommissioned, you can still see what they looked like through the Internet Archive: What's In The Library?: a 4-week prototyping project Wellcome Library Alpha: the front-end collection explorer

We also documented the process heavily in our Project Blog.

Disclaimer

Wellcome Collection and Good, Form & Spectacle no longer maintain this code or any of the underlying services. Use at your own risk! :skull:

Installation

Running the website, either locally or in production, requires both setting up the code to run the website itself, and some additional services.

Website

The server-side code for the website is written in Ruby. The first requirement is having the correct version of this installed, as specified in the Gemfile (currently Ruby 2.2.3).

This can be installed either as the system Ruby, or for local development, using a tool like rbenv or rvm (which allow multiple versions of Ruby to installed on the same machine).

Once Ruby is installed, the library (or 'gem') called Bundler needs to be installed (if not already) – this allows for easy installation and management of additional Ruby gems. It can be installed using gem install bundler.

Once Bundler is installed, the rest of the libraries (as specified in the Gemfile and Gemfile.lock) can be installed by running bundle install.

To start the web server, you can run the command specified in the Procfile:

bundle exec puma -C config/puma.rb.

Alternatively, for local development you could install the gem foreman (by running gem install foreman) and then run foreman start web. As well as running the command specified in the Procfile, this also reads the contents of .env and exposes the contents as environment variables.

As well as the web server, there is also a job queue worker process (run by a library called Sidekiq), which can be started by running foreman start worker or bundle exec sidekiq.

Running foreman start starts both a web server and a job queue worker at the same time.

Services

The website requires the following services to be installed and running on a server somewhere:

Postgres

Postgres is used as the primary database. It is an open source project, and can either be compiled and installed from source, or you can use a commerical cloud service (like Heroku or Amazon RDS).

For local development on a Mac, the easiest way to get it up and running is via Postgres.app, a packaged OSX app.

The project requires version 9.4 or greater.

The hostname, port, username, password and database name should be set in an environmnent variable called DATABASE_URL (eg postgres://user:pass@host:port/database_name).

The tables and indexes for the database can be setup for the first time by running bundle exec rake db:setup. After that, any future changes to the database structure can be made by running bundle exec rake db:migrate.

Redis

Redis is used a secondary datastore for managing the 'job queue' (tasks that need to be run). Again, it can be compiled and run from source, or you can use a commerical cloud service (like Heroku or Amazon ElastiCache).

The project requires version 2.8 or greater (3.0.3+ is recommended for large installations). These requirements come from Sidekiq, the queue manager.

The hostname, port, username and password should be set in an environmnent variable called REDIS_URL (eg redis://user:pass@host:port).

ElasticSearch

ElasticSearch is used a secondary datastore for fast searching and querying. Again, it can be compiled and run from source, or you can use a commerical cloud service (like Searchly, QBox or Amazon).

You should use version 1.7.*

The hostname, username and password should be set in an environmnent variable called ELASTICSEARCH_URL (eg https://user:pass@host).

Ingesting MARC XML records

If you are developing locally, it is probably best to import a copy of the live database (using the Postgres backup tools) rather than re-ingesting the MARC XML files (as this way your IDs will be consistent with the live site).

However if starting from scratch, or if wanting to update the live website with new and updated MARC records, you can run the various ingest scripts.

The main one is:

bundle exec rake ingest:records[filename] - where filename is the name of a MARC XML file (including extension) located in your local import folder of the project.

Note: This may take some time, depending on the file size. You will also need to do this for each separate XML file.

Once ingested, various other scripts may need to be run:

bundle exec rake people:queue_all_for_update_from_records – this extracts author information from the records and creates People and Creator (join table) records.

bundle exec rake subjects:import_from_records – this extracts subject information from the records and creates Subject and Tagging (join table) records.

bundle exec rake records:queue_download_package_job_for_newly_digitized_things – this updates the digitized status of the records.

bundle exec rake people:get_identifiers - this gets (other) identifiers for people using the servie VIAF

bundle exec rake people:queue_all_for_update_from_wikipedia - this updates Wikipedia information (bio and photos) for people.

Updating ElasticSearch

ElasticSearch currently can only be updated by deleting the indexes and re-importing all of the data. This can be done with:

bundle exec rake elasticsearch:people – People data bundle exec rake elasticsearch:subjects – Subject data bundle exec rake elasticsearch:records – Record data