kuhumcst / glossematics

The life of Louis Hjelmslev.
https://glossematics.dk
4 stars 1 forks source link
facsimile reagent tei tei-xml xml

Glossematics

Infrastrukturalisme is a joint project between the University of Copenhagen and Aarhus University. The project has the goal of publishing a web app on glossematics.dk to allow researchers from around the world to explore the life of the Danish linguist Louis Hjelmslev by making material available from previously unpublished primary sources, e.g. letters sent and received by the linguist.

Architecture

The source code has been split into separate backend and frontend directories located within /src/dk/cst/glossematics. The backend and frontend are written in Clojure and ClojureScript, respectively.

These two namespaces contain the routing tables and are therefore natural entry points for anyone wanting to explore the source code.

For now, the source code of Pedestal SP is also included in this repository (at /src/dk/cst/pedestal/sp), but it will eventually be spun off into its own git repository.

Frontend

The frontend is a so-called single-page app (SPA) written in ClojureScript, built with reitit and reagent. This app includes a facsimile viewer implemented using stucco and rescope. The viewer displays facsimiles and transcriptions in parallel. It supports transcriptions written in a subset of the TEI standard.

When someone visits the Glossematics website they will always be served the exact same HTML page. This page embeds a blob of JavaScript code which has been compiled from ClojureScript. It is this JavaScript code that creates the actual content on the page, occasionally fetching data from the backend in the background.

Backend

The backend is a Pedestal web service written in Clojure that uses SAML for authentication/authorisation by way of Pedestal SP. Jetty is used to serve the raw content, while nginx acts as a reverse proxy/gateway in production with SSL certificates regularly updated by Let's Encrypt's certbot. The production multi-container system is built and run using Docker.

The primary responsibility of the backend is

  1. serving the HTML page containing the frontend SPA and
  2. responding to API calls made from the SPA.

Server setup

In order for SAML encryption to work, a Java keystore needs to be created. This is the idiomatic way to handle certificates of any kind in Java... and therefore in Clojure. I generate a new keystore file from scratch in both production and on the development machine:

keytool -keystore /etc/glossematics/keystore.jks -keyalg RSA -genkey -alias glossematics`.

I also install the certbot tool from Let's Encrypt and run it precisely once to get an initial set of SSL certificates for HTTPS (located in the default location). These are picked up by the docker compose setup as part of the nginx-reverse-proxy service. All future renewal is handled automatically by this Docker container running the nginx instance and certbot at a regular interval.

In addition to the keystore and the SSL certificates, the IdP's certificate needs to be present on disk too. In our case, we need the WAYF certificate for the production server (and a self-supplied one for development). This public certificate doesn't need to be put inside a keystore. For local development, I run a "fake" IdP on localhost:7000 that I set up based on the guide found here.

Running the system in production

The supplied docker-compose.yml file is used to build and run the system in production. It sets up two containers: one that contains the backend Clojure web service and one running nginx+certbot that facilitates public HTTPS access to the content. The ClojureScript frontend is also compiled as part of the build and subsequently served by the backend.

Some environment variables must be set to establish volumes in the Docker container:

GLOSSEMATICS_CONF=${HOME}/.glossematics/conf.edn
GLOSSEMATICS_IDP_CERTIFICATE=${HOME}/.glossematics/idp-certificate.pem
GLOSSEMATICS_SAML_KEYSTORE=${HOME}/.glossematics/saml-keystore.jks
GLOSSEMATICS_FILES_DIR=${HOME}/.glossematics/files
GLOSSEMATICS_DB_DIR=${HOME}/.glossematics/db

These volumes allow the container to access this content in the local filesystem. Assuming the files all exist in those locations, these lines should ideally be put inside a .env file located in the docker/ directory to allow them to be picked up by the docker-compose.yml file.

The docker compose command is used to build and start the project from inside the docker/ directory:

# build, start, write output to shell
docker compose up --build

# the same, but run in detached mode
docker compose up -d --build

The systemd unit file

While docker compose is the actual process used to start the service, automatic startup on boot is facilitated by systemd.

The docker/ directory includes the systemd unit file glossematics.service used to register the service. This file should be copied to /etc/systemd/system/glossematics.service and the relevant services enabled:

systemctl enable docker
systemctl enable glossematics

When this is done (assuming the paths in the file are correct), the Glossematics service will start automatically on boot and be managed according to the docker-compose.yml specification.

Running the backend locally

The Docker setup is only meant to run in production as it relies on an HTTPS setup using authenticated certificates. For local development, the backend server should be started in the REPL and accessed via port 8080 (regular HTTP). This is the same port that is proxied in the Docker setup.

There is no good reason to guard the content behind HTTPS on the local development machine (this will lead to all sorts of trouble), but it might be a good idea to test just the Clojure web service locally running through Docker. In that case, you can just comment out the nginx-reverse-proxy part of the docker-compose.yml file and run the setup locally.

Development prerequisites

The development workflow of the project itself is built around the Clojure CLI for managing dependencies and shadow-cljs for compiling ClojureScript code and providing a live-reloading development environment.

In this project, the dependency management feature of shadow-cljs is not used directly. Rather, I leverage the built-in support in shadow-cljs for the Clojure CLI/deps.edn to download dependencies and build a classpath.

I personally use IntelliJ with the Cursive plugin which integrates quite well with the Clojure CLI.

macOS setup

(assuming homebrew has already been installed)

I'm not sure which JDK version you need, but anything 8+ is probably fine! I personally just use the latest from AdoptOpenJDK (currently JDK 13):

brew cask install adoptopenjdk

The following will get you the Clojure CLI and shadow-cljs, along with NodeJS:

brew install clojure/tools/clojure
brew install node
npm install -g shadow-cljs

JavaScript dependencies must be installed separately!

Workflow

Frontend

Development of the component is done using the live-reloading capabilities of shadow-cljs:

shadow-cljs watch app

This will start a basic web server at localhost:9000 serving the :app build as specified in the shadow-cljs.edn file.

It's possible to execute unit tests while developing by also specifying the :test build:

shadow-cljs watch app test

This will make test output available at localhost:9100. It's quite convenient to keep a separate browser tab open just for this. The favicon will be coloured green or red depending on the state of the assertions.

Personally, I use the Clojure CLI integration in Cursive to calculate a classpath and download dependencies. Something like this command is being executed behind the scenes:

clj -A:app:test -Spath

I have also set up some aliases in my personal ~/.clojure/deps.edn file to perform certain common tasks such as listing/updating outdated packages:

clj -A:outdated
clj -A:update

Backend

The namespace dk.cst.glossematics.backend defines the backend web service. This Pedestal web service can be started, stopped, restarted, and updated entirely through the Clojure REPL using the utility functions located inside that namespace. There is no need to install and setup a separate web server since Pedestal dynamically sets up a Jetty instance for this purpose.

The frontend being served is whichever frontend was last compiled which will usually be the development version that is created by running shadow-cljs watch app. The release version can be used instead by running:

shadow-cljs release app

... and making sure that dk.cst.glossematics.backend.index is reloaded in the REPL.

Development notes

Periodic security check-up

... via the National Vulnerability Database.

NOTE: this should be run periodically and the software should be updated accordingly to keep the threat level as low as possible!

First install nvd-clojure as a tool as per the instructions, e.g.

clojure -Ttools install nvd-clojure/nvd-clojure '{:mvn/version "2.10.0"}' :as nvd

Then run the actual task for all relevant aliases:

clojure -J-Dclojure.main.report=stderr -Tnvd nvd.task/check :classpath \""$(clojure -Spath -A:frontend:build)\""

This will print to the terminal + create a detailed report in several file formats, where the HTML report will likely be the most relevant one.


More help:

# print dependencies
clj -A:frontend:build -Stree

# locate transitive dep; replace 'bad-dependency' with an actual dependency
clj -A:frontend:build -Stree | grep -B 50 bad-dependency

# save dependency tree to a graph (deps.png)
clj -A:frontend:build -X:graph graph :output '"deps.png"'

Server permissions

The Docker container is run using a user created for the purpose. However, access to the volumes on the host is predicated on the existence of group with id 1024 owning both volumes!

I ran the following on the KU server to prepare it:

chown -R :1024 /data
chmod -R 775 /data
chmod g+s /data

Based on advice found here: https://medium.com/@nielssj/docker-volumes-and-file-system-permissions-772c1aee23ca

Timeline widget

The timeline used in the frontend is a fork of the obsolete SIMILE Timeline. The underlying JavaScript source code has been taken directly from the SIMILE project and reduced significantly in size. This JS source is then wrapped in ClojureScript in timeline.cljs.

However, the JavaScript source is still quite huge and written in a fairly incoherent style. For this reason, the visual style of the timeline must be handled by editing primarily

but also various other JS/CSS files depending on what needs to be changed.