gwu-libraries / scholarspace-hyrax

Hyrax-based repository for GW ETDs and other original content
https://scholarspace.library.gwu.edu/
MIT License
9 stars 4 forks source link
gw-digital-stewardship

scholarspace-hyrax Build Status

A Hyrax app for GW Libraries with:

The public application is accessible at https://scholarspace.library.gwu.edu/

Some convenient links to have handy:

Docker images

The Dockerized version of the ScholarSpace app uses the following images:

Component Image Name Version Source
Fedora server ghcr.io/samvera/fcrepo4 4.7.5 https://github.com/samvera-labs/docker-fcrepo
Postgres (for Fedora) postgres 15.4 https://hub.docker.com/_/postgres
Postgres (for Hyrax) postgres 9.5.25-alpine https://hub.docker.com/_/postgres
Solr library/solr 6.4.2-alpine Dockerfile-solr
Rails app scholarspace-app -- Dockerfile
Redis server redis 5-alpine https://hub.docker.com/_/redis
Sidekiq scholarspace-app -- Dockerfile

Notes on Docker images

Fedora server

Postgres

Solr

Rails/Hyrax app

Sidekiq

Setting up the application

  1. Install the Docker engine.
  2. Edit /etc/group and add your user (e.g., ubuntu) to the docker group.
  3. Run id $USER and note the values for uid and gid. Below you will add those to the .env file. Note: You can create a separate user for the app, but it is not necessary. That user will need to own /opt/scholarspace and subdirectories.
  4. Create an opt/scholarspace directory and clone the scholarspace-hyrax repository inside it.
  5. Create a directory for derivatives: mkdir -p /opt/scholarspace/scholarspace-derivatives.
    • For development, also create the following (empty) directories:
      /opt/scholarspace/certs 
      /opt/scholarspace/scholarspace-tmp 
      /opt/scholarspace/scholarspace-minter 
      /opt/scholarspace/scholarspace-ingest 
  6. In /opt/scholarspace/scholarspace-hyrax run cp example.env .env to create the local environment file.
  7. Edit .env to add the following values:
    • SCHOLARSPACE_GID, SCHOLARSPACE_UID
    • HYRAX_DB_USER, HYRAX_DB_PASSWORD (for the Hyrax app database)
    • CONTACT_EMAIL
    • Variables under the #Recaptch config comment
    • SMTP_USER and SMTP_PASSWORD
    • SERVER_NAME (hostname for Nginx)
    • NGINX_CERT_DIR and NGINX_KEY_DIR
    • SSL_ON (set to true if using)
    • SSL_ variables (if using)
    • PERM_URL_BASE (used for persistent links) Make sure to terminate the URL with a forward slash.
    • FEDORA_PG_USER, FEDORA_PG_PASSWORD, FEDORA_USER, FEDORA_PASSWORD (username and password for the Fedora db backend and the Fedora app, respectively)
    • SOLR_DATA_DIR (directory for Solr cores on host machine)
    • FEDORA_DATA_DIR (directory for Fedora data on host machine)
  8. Configure ReCAPTCHA.
    • Log in to the Google domain using the Google account that should have access to managing ReCAPTCHA API keys.
    • Create ReCAPTCHA API keys at the ReCAPTCHA admin console.
    • Register a new site (at https://www.google.com/recaptcha/admin/create). Select the ReCAPTCHA v2 "I'm not a robot" Checkbox type.
    • In the .env file, set the ReCAPTCHA API keys:
        RECAPTCHA_SITE_KEY=
        RECAPTCHA_SECRET_KEY=
  9. Adjust any other variables in the .env as needed.
  10. Edit docker-compose.yml as necessary.
    • If not using SSL, comment out the lines for the key and cert directories under the app-server service definition.
  11. If migrating data, prepare the Solr core and Fedora database locally (see below). Otherwise, create the /opt/scholarspace/fedora-data and /opt/scholarspace/solr-data directories to store the Fedora files on the host (e.g, sudo mkdir -p /opt/scholarspace/fedora-data). Grant permissions on the Solr directory to the Solr user: sudo chown -R 8983:8983 /opt/scholarspace/solr-data
  12. Start the application containers by running docker compose up -d. This will build the Hyrax app/Sidekiq and Solr images locally and start all containers. Note that the app is not yet ready to view and there may be some errors in the log at this point. If you get an authentication error, see the section on authenticating to the GitHub Container Registry below.
  13. If migrating data, restore the postgres database dumps for Fedora and Hyrax (see below).
  14. The Hyrax server will not work without the value of SECRET_KEY_BASE being set in the .env file. To generate a secret key using Rails, run docker exec -it --user scholarspace [app-server-container-name] bash -lc "docker/scripts/app-init.sh --create-secret". The app-server-container-name is probably scholarspace-hyrax-app-server-1 but can be ascertained by running docker ps.
  15. Add the secret key string to the .env file and restart the containers: docker compose down && docker compose up -d.
  16. If migrating data, run the Rake job to perform database migrations: docker exec -it --user scholarspace [app-server-container-name] bash -lc "docker/scripts/app-init.sh --run-migrations".
    • If creating a new instance (no migrated data), run the following command:
      docker exec -it --user scholarspace [app-server-container-name] bash -lc "rails db:{drop,create,migrate,seed}" 
  17. Visit the site in a web browser to trigger the Passenger app. (You won't see the compiled assets yet.)
  18. Add initial content blocks and precompile assets: docker exec -it --user scholarspace [app-server-container-name] bash -lc "docker/scripts/app-init.sh --apply-content-blocks --precompile-assets".
  19. (Production only): Restart the Nginx server: docker exec [app-container-name] bash -lc "passenger-config restart-app /".
  20. The following additional steps may be useful in setting up a new instance (no migrated data), all of which can be run as options of the app-init.sh script:
    • --create-roles: create default app roles (if they don't already exist)
    • --create-admin-set: create the default Admin Set, if it doesn't already exist
    • --add-admin-user: grant a ScholarSpace user the admin role. To use: first, create the user in the ScholarSpace UI. Then run this command, inserting an environment variable (admin_user=USER_EMAIL_ADDRESS) before the path to the script. This environment variable will be used by the Rake task to look up the user in the app database.
  21. To start the job to generate a sitemap, run docker exec -it --user scholarspace [app-container-name] bash -lc "docker/scripts/app-init.sh --create-sitemap". With default configurations, this job will run every morning at 12:30 AM, or can be configured in config/schedule.rb to run on a different schedule.

Redeployment

For convenience with prod deployments, use the redeploy script:

sudo chmod u+x script/redeploy-app.sh 
./script/redeploy-app.sh

This script will perform the following actions:

It will still be necessary to restart the app container after visiting the site (in order to the compiled assets to be visible).

Authenticating to GHCR

The ScholarSpace app and Solr Docker images are hosted on GitHub's Container Registry. Since the images are not public (by GW policy), pulling images from the registry requires user authentication with a personal access token.

  1. Log into your GitHub account and visit https://github.com/settings/tokens (under Developer Settings).
  2. Either select an existing access token or create a new one.
  3. Grant the token the write:packages or (minimally) read:packages permission.
  4. Once you have copied your token to a secure location, select the option to Configure SSO on the token.
  5. At the command line on the server where you're install ScholarSpace, do the following steps to log into the GHCR, using your GitHub username and the token you have created.
    export CR_PAT=[YOUR_TOKEN]
    echo $CR_PAT | docker login ghcr.io -u [USERNAME] --password-stdin
  6. Now run docker compose up -d.

Data migration

Solr

Fedora

Postgres (Hyrax & Fedora)

Setting up a new production instance

Migrating Production Database

In the app-server container (i.e. through docker exec -it scholarspace-hyrax_app-server_1 /bin/sh, followed by su scholarspace), run:

bundle exec rails db:migrate RAILS_ENV=production

Creating First Admin User and Necessary Admin Sets/Collections

In the app-server container (i.e. through docker exec -it scholarspace-hyrax_app-server_1 /bin/sh, followed by su scholarspace), run this rake task - replacing email and password with your new admin user email and password:

bundle exec rails gwss:prep_new_prod RAILS_ENV=production admin_user="AN-EMAIL-ADDRESS@EXAMPLE.COM" admin_password="A-PASSWORD"

This will create the admin and content-admin roles, create an admin user with the specified email and password, create the default admin sets, and create an ETDs admin set with the admin user as the owner.

In addition, this task will precompile assets for production.

Persistence

Deployment tips (production)

Development tips

echo "Recreating Fedora and Solr directories" rm -r /data/fedora && mkdir /data/fedora rm -r /var/solr/data && mkdir /var/solr/data && chown -R 8983:8983 /var/solr/data echo "Cleaning up ScholarSpace files" rm -r /opt/scholarspace/scholarspace-derivatives/derivatives echo "Removing Docker volumes" docker volume rm $(docker volume ls -q)


After bringing down the containers, run this script (with `sudo`) to clear out all persistent storage, including the Rails database, before bringing back up the containers. 

# Development Mode

### Preparing Development Databases

NOTE: The following steps assume that Docker containers have launched successfully and are currently running. 

To prepare a development server:
- Run `docker exec -it scholarspace-hyrax_app-server_1 /bin/sh`
    - This will give you access to an interactive terminal (`-it`) in the container with the Rails application.
- Switch to the in-container "scholarspace" user with `su scholarspace`
- Run:
    - `bundle exec rails db:create` to create the `development` and `test` databases.
    - `bundle exec rails db:migrate` to run the database migrations for the `development` and `test` databases. 
    - `bundle exec rails db:seed` to create (in the `devleopment` environment): 
        - An admin role and user (email and password set in `.env`)
        - A content admin role and user (email and password set in `.env`)
        - The default admin set
        - The admin set collection type
        - The user collection type
        - An `ETDs` admin set

### Running a Development Application

From a terminal attached to the Rails application container (i.e. through `docker exec -it scholarspace-hyrax_app-server_1 /bin/sh`), run:

- `bundle exec rails s -b 0.0.0.0` 

This command starts a Passenger/NGINX server running Rails in `development` mode on port 3000, accessible at:
- http://YOUR-EC2-INSTANCE-URL.compute.amazonaws.com:3000 (assuming AWS security groups have been configured to allow traffic on port 3000)

As you interact with the `development` application in a browser, the logs will output to the terminal where you ran the `bundle exec rails s -b 0.0.0.0` command, with a `debug` level of detail. 

To stop the `development` server, press `ctrl + c` in the terminal where you ran the command to start the server.  You can restart the development application, and previous changes (i.e. deposited works, created collections, etc) should persist.