The Racket Package Catalog Server

The Racket Package Catalog comprises two pieces of software that work in tandem:

pkg-index, a.k.a. the backend: responsible for managing the package database and user database, and periodically polls package sources to update checksums. Eventually, all Racket package clients can poll the package server, instead of checking each package individually, but package clients do not access the backend's information directly.
website, a.k.a. the frontend: responsible for formatting the package database and rendering it as a web site, and also for publishing the current package database (as determined by the backend) to the site that package clients consult.

Both frontend and backend run in the same Racket process, but as separate server threads. Each also has additional threads for periodic and internal tasks. The frontend sends package-change requests to the backend, and it watches for periodic updates (especially new checksums) from the pkg-index backend. The servers were originally implemented separately, and there is some value to keeping them conceptually separate.

Although the implementation here is not necessarily tied to the main Racket deployment's configuation, various configuration options make the most sense in terms of the main deployment's structure:

https://pkgs.racket-lang.org/ is an S3-hosted web site, which makes it as available as possible. The content of this site is uploaded is generated by the frontend server. (Also, pkg.racket-lang.org is set up forward to pkgs.racket-lang.org, in case someone forgets the "s" in "pkgs".)

The frontend server will tend to forward back into this static content, but the "login" button in the static view goes to the dynamic frontend server. After a user has logged in, the dynamic server tends to serve information directly, instead of forwarding. So, the server needs to be up and working well for logged-in use or gathering package updates, but not for querying the most recently published package updates.
https://pkgd.racket-lang.org/ is the server frontend and backend implemented here. More precisely, it's an Apache instance that sends most URLs to the frontend server, but anything path that starts api or jsonp is sent to the backend server (so that the backend functionality remains accessible). But to the degree that the frontend server needs to talk to the backend server, it does so more directly.

Prerequisites

In addition to Racket v8.14.0.3 or later, you will need to install the following Racket packages:

raco pkg install --skip-installed \
     'https://github.com/racket/infrastructure-userdb.git#main' \
     reloadable \
     aws \
     s3-sync \
     plt-service-monitor

Configuration

You can run the server in test mode with make run or ./run or racket src/main.rkt with no further configuration, except that

You'll need to have a certificate in place. Use make keys to produce a self-signed certificate. The certificate and key are dropped into compiled/root so that they're in the right place for a default configuration.
Probably you'll want to seed the set of registered packages. See "Adding packages" below.

By default, all package state and generated files go into compiled in the current directory. (The current directory when you run the server doesn't have to be the top of this Git repo checkout.)

An advantage of using make run or ./run is that it sets PLTSTDERR to turn on lots of logging. You can run just the frontend as src/website/main.rkt or just the backend with src/pkg-index/main.rkt. Running just the frontend requires running the backend at least once to generate its output for the frontend, though, or configuring the frontend to use another source via pkg-index-url as described below.

When you use make run or ./run, it actually runs configs/${CONFIG}.rkt, so you can set CONFIG as an environment variable or makefile variable to pick a configuration there. If CONFIG is not defined, testing is used (which is an empty configuration). For example, to select configs/live.rkt, set CONFIG to live. A good place to do this is in the run-prelude file; see the description of run-prelude below.

Within a configuration file, configuration details are to be given as a hashtable to main. Whenusing the testing configuration of ./run or when using racket src/main.rkt, you can supply a --config argument to specify a module that exports a config hashtable.

Keys useful for deployment:

port: number; defaults to the value of the SITE_PORT environment variable, if defined; otherwise, 7443.
pkg-index-port: number; defaults to the value of the SITE_PKG_INDEX_PORT environment variable, if defined; otherwise, 9004.
ssl?: boolean; default is #t, unless PKG_SERVER_HTTP is defined.
reloadable?: boolean; default is #t if the SITE_RELOADABLE environment variable is defined; otherwise, #f.
recent-seconds: number, in seconds; default is 172800. Packages modified fewer than this many seconds ago are considered "recent", and displayed as such in the UI.
static-output-type: either 'aws-s3 or 'file, indicates where the frontend write the static-site data:
- When 'file (the default),
  - static-content-target-directory: either #f or a string denoting a path to a folder to which the static content of the site will be copied for local serving.
- When 'aws-s3,
  - aws-s3-bucket+path: a string naming an S3 bucket and path. Must end with a forward slash, .../. AWS access keys are loaded per the documentation for the aws module; usually from a file ~/.aws-keys.
dynamic-urlprefix: string; absolute or relative URL, prepended to URLs targetting dynamic content on the site, i.e., for when the frontend wants to serve a link back to itself.
static-urlprefix: string; absolute or relative URL, prepended to relative URLs referring to static HTML files placed in static-generated-directory for when the dynamic server wants to refer to static content.
pkg-index-generated-directory: a string pointing to where the backend places its redered files; the main rendered file that the frontend cares about is pkgs-all.json.gz, although the backend write a whole package catalog and web site there.
user-directory: directory containing the user database.
email-sender-address: string; defaults to pkgs@racket-lang.org. Used as the "from" address when sending authentication emails on behalf of the server.
beat-s3-bucket: string or #f; defaults to #f a bucket name for regsitering heartbeats, or #f to disable heartbeats; the region is determined automatically from the bucket name.
beat-publish-task-name: string; defaults to "pkgd-publish"; a task name for heartbeats after publish information for all packages.

Keys useful for development:

package-index-url: string; an alternative source that the frontend uses to get pkgs-all.json.gz, such as http://pkgs.racket-lang.org/pkgs-all.json.gz to pull from the live database instead of the running backend. The default is based on pkg-index-generated-directory unless the PACKAGE_INDEX_URL environment variable is defined.
package-fetch-interval; number, in seconds; default is 300.
session-lifetime: number, in seconds; default is 604800.
static-generated-directory: string; names a directory within which generated static HTML files are to be placed. Must be writable by the user running the server.
disable-cache?: boolean; default is #f; a #t value causes the frontend to always redirect to itself to serve the package dynamically, instead of redirecting to generated static files.
backend-baseurl: string; defaults to a https://localhost: followed by pkg-index-port; must point to the backend package server API root, such that (for example)/jsonp/authenticate`, when appended to it, resolves to the authentication call.
pkg-build-baseurl: string; defaults to http://pkg-build.racket-lang.org/. Used to build URLs relative to the package build host, such as for documentation links and build reports.
pkg-index: #f or hash table; use #f to disable the backend server entirely, or provide a hash table to condigure the backend specifically.

Backend keys for a pkg-index configuration table within the main configuration:

port: number; defaults to pkg-index-port from the enclosing configuration or 9004; port on which the backend site will be served.
ssl? - boolean; defaults to ssl? from the enclosing configuration or to #t; a true value serves HTTPS and requires root/server-cert.pem and root/private-key.pem.
src: path; defaults to src/pkg-index relative to here
static.src-path: path; defaults to src/static, the location of of (non-generated) HTML/JS/CSS files to be copied to be static-path (see below), although these files are mostly not used anymore
static-path: path; defaults to src/static-gen; staging area where all static resources - both generated and non-generated - are written.
notice-path: path; defaults to static-path/notice.json; whenever the server has a message for site users, the message will be placed in this file.
root: path; defaults to pkg-index-generated-directory from the outer configurartion; determines several other defaults
users.new-path: path; defaults to user-directory from the outer configuration, which defaults to pkg-index-generated-directory/users.new; directory in which to hold user records, one file per user
cache-path: path; defaults to root/cache; names a directory where files summary.rktd and summary.rktd.etag will be created.
pkgs-path: path; defaults to root/pkgs; names a directory where one file of package information for each package in the catalog will be stored.
github-client_id (obsolete): string or #f; defaults to the contents of the file at root/client_id, if it exists; should be a Github client ID string (hex; twenty characters long, i.e. 10 bytes of data, hex-encoded), used only if package downloaing is forced to use the GitHub API by setting the PLT_USE_GITHUB_API environment variable.
github-client_secret (obsolete): string or #f; defaults to the contents of the file at root/client_secret, if it exists; should be a Github client secret string (hex; forty characters long, i.e. 20 bytes of data, hex-encoded), used only when github-client_id is used.
s3-bucket: string or #f; defaults to the contents of the environment variable S3_BUCKET, if it is defined, #f otherwise; AWS credentials are found by the s3 package, typically from ~/.aws-keys; if set to #f, S3 synchronization will be disabled.
s3-bucket-region - string; defaults to the contents of the environment variable S3_BUCKET_REGION, if it is defined; otherwise, to #f; needs to be non-#f if s3-bucket is.
beat-s3-bucket: string or #f; defaults to beat-s3-bucket from the enclosing configuration table or #f; a bucket name for regsitering heartbeats, or #f to disable heartbeats; the region is determined automatically from the bucket name.
beat-update-task-name: string; defaults to "pkgd-update"; a task name for heartbeats after updating information for all packages.
beat-upload-task-name: string; defaults to "pkgd-upload"; a task name for heartbeats after uploading information for all packages.
beat-update-task-name - string; defaults to "pkgd-update". A task name for heartbeats after updating information for all packages.
redirect-to-static-proc: function from HTTP request to HTTP response, which should issue a redirect pointing to a static resource; defaults to a function which replaces the scheme with the contents of the configuration variable redirect-to-static-scheme, the host with redirect-to-static-host, and the port to redirect-to-static-port. These, in turn, default to "http", "pkgs.racket-lang.org" and 80, respectively.
atom-self-link: string; defaults to https://pkg.racket-lang.org/rss; sed as the rel=self link in the header of the generated ATOM feed.
atom-link: string; defaults to https://pkg.racket-lang.org/; used as the default site link in the header of the generated ATOM feed.
atom-id: string; defaults to https://pkg.racket-lang.org/; used as the ATOM feed ID.
atom-compute-package-url: function from package-name symbol to URL string; defaults to a function which calls format with the package name and a format template-string from atom-package-url-format-string, which in turn defaults to http://pkg.racket-lang.org/#[~a].

Development Setup

Adding packages

Instead of manually adding packages to a fresh instance of the package web server, use raco pkg catalog-copy to copy an existing catalog into a directory tree. Then, move the pkg directory (no "s") in the catalog copy to be root/pkgs (with "s") where root is the server's root directory — so, "compiled/root/pkgs" by default.

  $ raco pkg catalog-copy https://pkgs.racket-lang.org compiled/pkgs-copy
  $ mkdir -p compiled/root/pkgs
  $ mv compiled/pkgs-copy/pkg/* compiled/root/pkgs/

Beware, however, that the backend will start by updating the checksum of every package, so consider using a specific package name in place of * or a glob that selects a small set of packages.

Warm up

When the server is started, the backend starts by looking for new checksums, while the frontend immediately checks for backend updates. Since those run concurrently, you may not immediately see updates via the frontend even when the backend has completed its scan (which you might infer from logging output). At that point, restarting is the fastest way to warm up the frontend.

Adding users

You can use the website frontend to add a user. Email for a new user is sent via sendmail, so if you don't have that configured, just watch the logs to see the token that would have been sent.

Automatic code reloading

If you would like to enable the automatic code-reloading feature, set the environment variable SITE_RELOADABLE to a non-empty string or set the reloadable? configuration variable to #t.

You must also delete any compiled code .zo files. Otherwise, the system will not be able to correctly replace modules while running.

Therefore, when using automatic code reloading, use just

make run

and make sure to run make clean beforehand, if you've run make compile at all previously.

Deployment

Static Content

The site can be set up to run either

entirely dynamically, generating package pages on-the-fly for each request;
both statically and dynamically, with HTML renderings of package pages stored on and served from disk like other static resources such as Javascript and CSS; or
both statically and dynamically, as the previous option, but additionally replicating both static and generated content to a local file-system directory and invoking an optional update hook that can be used to further replicate the content to S3 or a remote host.

The default is mixed static/dynamic, with no additional replication.

For a fully dynamic site, set configuration variable disable-cache? to #t.

To enable replication, set configuration variable static-content-target-directory to a non-#f value, and optionally set static-content-update-hook to a string containing a shell command to execute every time the static content is updated.

S3 Content

To set up an S3 bucket — let's call it s3.example — for use with this site, follow these steps:

Create the bucket ("s3.example")
Optionally add a CNAME record to DNS mapping s3.example to s3.example.s3-website-us-east-1.amazonaws.com. If you do, static resources will be available at http://s3.example/; if not, at the longer URL.
Enable "Static Website Hosting" for the bucket. Set the index document to index.html and the error document to not-found.

Then, under "Permissions", click "Add bucket policy", and add something like the following.

{
  "Id": "RacketPackageWebsiteS3Policy",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RacketPackageWebsiteS3PolicyStmt1",
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": ["arn:aws:s3:::s3.example",
                   "arn:aws:s3:::s3.example/*"],
      "Principal": {
        "AWS": ["<<<ARN OF THE USER TO WHOM ACCESS SHOULD BE GRANTED>>>"]
      }
    }
  ]
}

The user will need to be able to read and write objects and set CORS policy. (CORS is configured automatically by code in src/static.rkt.)

Supervision

Startable using djb's daemontools; symlink this directory into your services directory and start it as usual. The run script starts the program, and log/run sets up logging of stdout/stderr.

If the file run-prelude exists in the current directory on startup, it will be dotted in before racket is invoked. A prelude is useful to update PATH for a locally-built racket bin directory or to select an appropriate CONFIG setting.

On Debian, daemontools can be installed with apt-get install daemontools daemontools-run, and the services directory is /etc/service/.

Control signals

You can send signals to the running service by creating files in /etc/service/webservice/signals/. For example:

creating .pull-required causes the server to shell out to git pull and then exit. Daemontools will restart it.
creating .restart-required causes it to exit, to be restarted by daemontools.
creating .reload causes an explicit code reload. Useful when automatic code reloading is disabled.
creating .fetchindex causes an immediate refetch of the package index from the backend server.
creating .rerender causes an immediate rerendering of all generated static HTML files.

See src/signals.rkt for details of the available signals.

So long as sudo chmod 0777 /etc/service/webservice/signals, these are useful for non-root administrators to control the running service.

In particular, a git post-receive hook can be used to create the .pull-required signal in order to update the service on git push.

Copyright and License

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

racket / racket-pkg-website

readme