This repo builds and stores singularity images to be used in lab workflows.
The set of image types that are available are displayed on the right under "packages".
These are the current image types:
geospatial_plus
: Builds on the rocker geospatial docker image.
includes a few additional packages (including data.table, intervalaverage and terra)
also has configured the R user library location to avoid clashing with the user library on the host machine.
geospatial_plus_pdf
: geospatial_plus with the addition of a full tex installation. This should be able to knit PDFs without relying on tinytex needing to download tex installation files.
geospatial_plus_ml
geospatial_plus_ml_horovod
nationalspatiotemporal
neogeo_portable
mkl_centos7
ess
If you'd like to test out an image interactively, try
singularity shell oras://ghcr.io/kaufman-lab/geospatial_plus:4.1.0
Note that singularity will cache remote images, so using the oras URI (as above) may be more convenient than finding a storage location for the .sif file. However, there may be a tradeoff since using the URI pulls a hash so there is probably a slight delay to starting the image. If you want to instead download the image, you can use singularity pull
.
Please disregard the instructions under github packages on how to pull these images using docker commands. These are not docker images and those commands will not work. Those instructions are auto-generated by github and I don't know how to turn them off.
Submit an issue if you'd like additional packages or software built into an image.
Depending on the complexity, we may start a new image type (package) or add to an existing image.
Within a version-tag (e.g. geospatial_plus:4.1.0), R packages built into the image are stable, even when the image (including the underlying rocker image) is rebuilt because of how versioned rocker images install packages. Installing new R packages into the user library, while not recommended (it's better to request a package be added to the image), is also stable (the same version will always be installed) because install.packages by default downloads from a specific date's snapshot of the rstudio package manager. Python packages built into the image are stable because they are all manually specified as pip install commands with a version specified. However, unlike for R packages, installing python packages into the user directory is not reproducible because pip install will just download the newest version. If anyone is aware of a tool that makes pip install point to a specific day's snapshot of pypi (similar to how rstudio package manager/MRAN works) please let me know.
For images designed to host an IDE: While these images can be used non-interactively to run R and Python scripts (e.g. via singularity exec oras://ghcr.io/kaufman-lab/geospatial_plus:4.1.0 Rscript scriptname.R
) without any additional options, if you want to use the image interactively for development (e.g. via singularity shell
or singularity instance
) you should --bind
a directory on the host machine to /pseudohome
. This will allow installation of R and Python packages to an instance/shell-specific directory thus fully isolating the instance even during development. Without first binding pseudohome
, user installation of packages will fail because pseudohome does not exist inside the image. This is arguably a good thing because it prevents accidental installation of packages or accidental loading of packages installed in the user library when images are being used for production.
This repository uses a pre-commit hook to auto-generate definition files. If you want to edit a definition, do not edit the definition files directly. Instead, edit the contents of build_scripts which get automatically pasted together into definition files during the pre-commit hook. The pre-commit hooked is tracked as part of the repository. In order to use it, you'll need to tell git to use a non-default hook directory and then set the pre-commit hook as executable.
git config --local core.hooksPath githooks/
chmod -R u+x ./githooks/pre-commit
This auto-generated definition file strategy allows different images to rely on shared code without relying on sequential builds which would be slow and also harder to figure out all the code that was used to create an image.
Note that github action build jobs will fail if the definition file isn't detected to be the exact result of the pasted-together contents of build_scripts. This ensures everyone uses the pre-commit hooks and mitigates the consequences of accidental direct editing of the definition files.
Note that the pre-commit hook calls an R script, so this is a local dependency as well as in the action where that script is run again to ensure it had been run as a pre-commit hook.
During the pre-commit hook, create_def_files.R (stored in the githooks directory) pastes together the contents of the build_scripts directory in the following way to create definition files:
Each directory in the build_scripts directory corresponds to an image. The directory name will determine name of the definition file (.def will be appended) which will be placed at the git repo root.
For each image, a definition file is constructed from the following files, corresponding to sections of a singularity definition file. All of these files are optional. If any of these files is not in the image directory it is silently skipped. (note bootstrap is only optional if a dependency structure is specified. see below).
bootstrap
%setup
%files
%post
%environment
an image directory also may contain a dependencies.R script. This script is used to define the dependency structure of an image. Ie, the geospatial_plus_ml4.1.0 image depends on geospatial_plus__4.1.0. This means that all the code in the geospatial_plus4.1.0 definition file is run prior to any geospatial_plus_ml4.1.0-specific definition code. This is done a on a section-by-section basis. So the %post section of geospatial_plus_ml4.1.0's definition file is constructed first from geospatial_plus4.1.0's post section, then from geospatial_plus_ml4.1.0's post section. Then this process is repeated again for the environment section. The one exception is the bootstrap section. Each singularity definition can only have one bootstrap, so an image directory must contain either a bootstrap file or a dependency.R file. If a dependency file is present, the bootstrap file from the first dependency is used as the image's bootstrap.
Specifying the dependency.R script: this should be an R script that creates an R object called dependencies, which is a length-1 vector containing a directory name in build_scripts. Do not include an image's own directory name in the dependency file. If an image has no dependencies, just omit the dependency.R file entirely.
This repo uses github actions to build images. There are three triggers:
A couple more notes:
Inspiration for this repo, along with more complicated examples for building multiple images can be found here: https://github.com/singularityhub/github-ci