A cloud native file cache for Jenkins pipelines. The files are stored in a S3-Bucket. The functionality is very similar to the one provided by GitHub Actions.

Motivation

The primary goal is to have a file cache for so called hot agent nodes. Those nodes are started on demand when an execution is scheduled by Jenkins and killed after the execution is finished (e.g. by using the kubernetes-plugin or nomad-plugin). This is fine but has also some drawbacks and some of them can be solved by having a file cache in place (e.g. to cache build dependencies or statistic data for code analysis or whatever data you want to be present for the next build execution).

Installation

Download the latest version (see releases)
Complete the installation via Manage Jenkins -> Manage Plugins -> Advanced -> Upload Plugin

For automated installations via plugin.txt you can use an entry like below:

jenkins-pipeline-cache::https://github.com/j3t/jenkins-pipeline-cache-plugin/releases/download/0.2.0/jenkins-pipeline-cache-0.2.0.hpi

Configuration

Go to Manage Jenkins -> Configure System -> Cache Plugin
Set Username (aka S3-Access-Key)
Set Password (aka S3-Secret-Key)
Set Bucket
Set Region
Click Test connection

The plugin requires the following permissions in S3 for the bucket:

s3:HeadObject
s3:GetObject
s3:ListBucket
s3:PutObject
s3:DeleteObject - Only if the CleanupTask is activated (threshold > 0)

Usage

Below you can find an example where the local maven repository of the spring-petclinic project is cached.

node {
    git(url: 'https://github.com/spring-projects/spring-petclinic', branch: 'main')
    cache(path: "$HOME/.m2/repository", key: "petclinic-${hashFiles('**/pom.xml')}") {
        sh './mvnw package'
    }
}

The path parameter points to the local maven repository and the key parameter is the hash sum of all maven poms, prefixed by a dash and the project name.

The hashFiles method is optional but can be helpful to generate more precise keys. The idea is to collect all files which have impact to the cache and then create a hash sum from them (e.g. hashFiles('**/pom.xml') creates one hash sum over all maven poms in the workspace).

If the job gets executed, the plugin tries to restore the maven repository from the cache by using the given key. Then the inner-step gets executed and if this was successful and the cache doesn't exist yet then the path gets cached.

Below you can find a complete list of the cache step parameters:

Name	Required	Description	Default	Example
path	x	Path to the directory which we want to be cached (absolute or relative to the workspace)		`$HOME/.m2/repository` - cache the local maven repository
key	x	Identifier which is assigned to the cache.		`maven-4f98f59e877ecb84ff75ef0fab45bac5`
restoreKeys		Additional keys which are used when the cache gets restored. The plugin tries to resolve them in the defined order (`key` first then the `restoreKeys`) and in case this was not successful then the latest key with the same prefix gets restored.		`['maven-', 'petclinic-']` - restore the latest cache where the key starts with `maven-` or `petclinic-` if the `key` not exists
includes		Ant-style pattern applied to the `path` to filter the files which are included.	`*/` - includes all files	`*/.xml` or `*/.xml,*/.html` see here for more details
excludes		Ant-style pattern applied to the `path` to filter the files which are excluded.	Excludes no files	see `includes`

Storage providers

Any S3 compatible storage provider should work. MinIO is supported first class, because all the integration tests are executed against MinIO.

In order to use an alternative provider, you probably have to change the Endpoint parameter.

Go to Manage Jenkins -> Configure System -> Cache Plugin
Update the Endpoint parameter
Click Test connection

Cleanup

You can define a threshold in megabyte if you want to limit the total cache size. If the value is > 0 then the plugin checks every hour the threshold and removes last recently used items from the cache as long as the total cache size is smaller than the threshold again (LRU).

Go to Manage Jenkins -> Configure System -> Cache Plugin
Update the Threshold parameter

Disclaimer

Anyone which can create/execute build jobs has basically also access to all caches. The 'attacker' just needs a way to execute the plugin, and they need to know the key which is assigned to a particular cache. There is no list available where all the keys are listed but the build logs contain them. The plugin guarantees that the same key is not created twice and also that an existing key is not replaced, but it not guarantees that a restored cache was not manipulated by someone else which has access to the S3 bucket for example.

As a general advice, sensitive data or data which cannot be restored from somewhere else or not regenerated should not be stored in caches. It should also not a big deal, besides that the build takes longer, if a cache has been deleted (e.g. by accident, by the cleanup task, by a data crash or ...).

Pitfalls

the hashFiles step expects an Ant-Style pattern relative to the workspace as parameter
the includes/excludes parameter must be an Ant-Style pattern relative to the path
the cache gets not stored if the key already exists or the inner-step has been failed (e.g. unit-test failures)
existing files are replaced but not removed when the cache gets restored
the plugin creates a tar archive from the path and stores it as an S3 object
the S3 object contains metadata
- CREATED - Unix time is ms when the cache was created
- LAST_ACCESS - Unix time is ms when the cache was accessed last

j3t / jenkins-pipeline-cache-plugin

readme