woodpecker-ci / woodpecker

Woodpecker is a simple, yet powerful CI/CD engine with great extensibility.
https://woodpecker-ci.org
Apache License 2.0
4.07k stars 352 forks source link

Pipeline-scoped (temporary) volumes [solves caching, preparation steps, inter step communication, ...] #1452

Closed smainz closed 1 year ago

smainz commented 1 year ago

Clear and concise description of the problem

In some pipelines/workflows it is necessary to store data created by one step outside the source directory for use in other steps. E.g.

At the moment the only way to share this between the steps is to either

Suggested solution

Main idea

Take the idea drone has already implemented and add a pipeline scoped configuration for volumes and change the semantic for volume configuration in a step to reference these volumes.

On pipeline/workflow level:

volumes:
  - name: my-temp-volume
    type: temp

In services / steps:

steps:
  - name: Interact with docker in docker
    volumes:
      - name: my-temp-volume   # this references a pipeline scoped volume
        path: /var/run/docker  # this is the path in the container to mount the volume to
    commands:
     - docker ps -a

services:
  - name: Docker in Docker
    image: docker/dind
    volumes:
      - name: my-temp-volume   # this references a pipeline scoped volume
        path: /var/run/docker  # this is the path in the container to mount the volume to

Temporary volumes should be created before the pipeline is run and deleted after the last step of a pipeline has finished. The use of temporary volumes should not require privileged pipelines

Possible enhancements

Volumes on different kind

This could be enhanced on the pipeline level to provide different types of volumes

volumes:
  - name: my-host-volume
    type: host
    path: /some/absolute/path
  - name: my-docker-vomume
    type: docker
    volume: some-docker-volume
    create: false     # only works, if such a volume is already created

To solve the caching issue auto-magically, there could be volumes of type cache, which could be handled by some woodpecker magic, if still required. I consider a general caching solution a hard problem, but it could work something like this:

volumes:
  - name: some-kind-of-cache
    type: cache
    refresh:   ...
    additional-cache-config: ...

Volumes on different scope

One could provide volumes for different scopes (pipeline vs. workflow). This would require the possibility to provide configurations on a pipeline level (multiple workflows)

Quota for volumes

The only bad thing a step could do is to fill up the complete disk where the volumes are stored. There could be some limits on that (defined on the agent level) or in the project settings. Needs further ideas on how / what to limit

To be discussed

To be discussed: What would be the requirement for matrix builds? E.g.

matrix:
  GO_VERSION:
    - 1.4
    - 1.3
    -
volumes:
  - name: my-volume-${GO_VERSION}
    type: temp

or something else?

Alternative

At the moment the following is possible (don't know if it is by intention):

  - name: Test volumes (1)
    image: alpine     
    volumes:
      - volume-test_${CI_PIPELINE_NUMBER}:/x
    commands: 
      - ls -lah /x
      - touch /x/file
      - ls -lah /x

  - name: Test volumes (2)
    image: alpine     
    volumes:
      - volume-test_${CI_PIPELINE_NUMBER}:/x
    commands: 
      - ls -lah /x
      - touch /x/file2
      - ls -lah /x

Executing the pipeline will create a docker volume on the host (or reuse an existing one) and bind it into the steps container.

But this

Additional context

There are already quite some issues for that or similair topic, but no common sulution

And a PoC on caching:

But i would like to see a common and consistent solution for most (all?) of these issues.

Validations

anbraten commented 1 year ago

Some thoughts:

smainz commented 1 year ago
  • we need to make sure the user can only use his own volumes not the ones from others / existing volumes from none Woodpecker systems => maybe by using some prefix?

Temporary volumes should be created when a pipeline is executed. To make the names unique we could use some random number and prefix it with wp_volume_<some UUID per pipeline run>_<volume number in config> This will make it unique enough and easy to recognize.

For the other types of volumes, you can use some config to allow volumes with names / paths matching some pattern, but this is the same problem we have now with the current volume implementation. So other types of volumes will require trusted projects for now.

  • we should make sure volumes get removed after a pipeline is done or by some cleanup otherwise the agents system will be full with dangling volumes

That is the idea of temporary volumes:

docker volume create wp_volume_...._1

docker run -v  wp_volume_...._1:/path/configured/in/step/for/volume/1
...
docker volume remove wp_volume_...._1

For other types of volume,s we have the same problem as today. maybe a cleanup job will help.

  • how should a backend without volume support (local, ssh, ...) handle volumes?

Probably woodpecker has to take the same route drone has taken: Have different config options per backend. I do not see how a local / ssh backend could support it, but for those you can use a directory with some naming convention. They can write on any place the user has permissions for.

BTW: What are people using local / ssh backends for?

  • should different / parallel steps be able to use the same volume? For k8s sharing rw volumes is normally a problem

Yes please. Maybe we once will externl need volume provider for these scenarios.

  • Could we "just" place the workflow folder into a sub-directory, so the user can write to some folders like /workspace/my-cache while the repo is at /workspace/code...

My use case is to put some files in the $HOME directory (ssh keys, .npmrc, settings.xml ,...) and to share a docker socket with a serivce container. Having this in the workspace does not help much, as you have to do a lot of config besides just using a third party program off the shelf.

maxkratz commented 1 year ago

This would ease the sharing of a Docker socket with a service container. It would be a nice feature!

lafriks commented 1 year ago

Need to keep in mind that docker volumes won't help much if multiple agents are deployed on different hosts or in swarm where agents on restart can change swarm node.

Also if you are allowed to specify host path for volume that would definitely create security issues