GSoC Tracking Issue: CRI-based CSI image volume driver

ecordell commented 3 years ago

The csi-driver-image-populator is a CSI plugin that allows you to mount the contents of a container image as a volume in a container.

Example:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.13-alpine
    ports:
    - containerPort: 80
    volumeMount:
    - name: data
      mountPath: /usr/share/nginx/html
  volumes:
  - name: data
    csi:
      driver: image.csi.k8s.io
      volumeAttributes:
          image: kfox1111/misc:test

The current driver is built around buildah, which uses the registries.conf configuration to set up a connection to registries.

This creates a dichotomy between connections from the cluster to external registries when pulling images for pods vs. pulling images for volumes. Pod image pulls are configured via the cluster CRI implementation, node, and Pod config, while volume image pulls are only configured via the configuration that buildah understands.

This means that proxy configuration, cert configuration, auth information, and mirror information is not shared unless the CRI implementation also understands registries.conf (currently this is only cri-o)

The goals of this project in decreasing order of importance:

Use CRI to pull images that will be mounted as volumes
Make updates to CRI/CSI to make their integration simpler
Explore options to support OCI-artifacts directly
Move csi-driver-image-populator in-tree in kube

CRI Endpoint

There is no standard location for the CRI Endpoint - making a CRI-agnostic CSI driver requires providing the endpoint up front:

- mountPath: /var/run/dockershim.sock
  name: cri-socket-dir

or a change to CRI to allow a well-known location (e.g. /var/run/cri.sock)

CRI Storage Location

There is no standard location or directory structure for an image on-disk across CRI implementations. This means that once an image is on a node, there's no standard way to get its contents.

A solution that requires no modifications to CRI (mocked up here) is to:

Use CRI to start a pod with two containers and a shared volume
The first container contains a statically-linked version of cp
The second container is the target container to be used as a volume
The first container copies the contents of the shared volume out to a standardized location on the node.
The contents, now on the node, are mounted into the user-requested pod

It would be nice to find an alternative that:

Doesn't require building or running a statically built cp
Doesn't require running an image via CRI to get the final output filesystem
Doesn't require the user to input the storage location / format for their CRI implementation

CRI

The CRI api is defined here

It is worth exploring what changes, if any, would make some of the above goals possible.

OCI Artifacts

Most CRI implementations do not support pulling non-runnable images.

Others

The metadata (image manifest, manifestlist, labels, etc) are all useful information for a consumer of an image, especially for OCI artifacts.

Some ideas:

CRI updated to include explicit support for non-runnable images
C*I defined for pulling / unpacking but not running
Expose metadata (annotations/labels) as data that can be mounted as well
Options for exposing the full oci manifest / manifestlist as a volume

Ananya-1106 commented 3 years ago

Hello! I am interested in learning and contribute here but I am new to this platform. Please guide me from where I should start.

atoato88 commented 3 years ago

@Ananya-1106 Hi, thank you for comment :tada: Please see here for starting contribution.

viveksahu26 commented 3 years ago

Hey @ecordell, one doubt here. What does this line mean. < The current driver is build around buildah>.
As per my understanding, Is buildah providing storage here or acting as a CSI plugin.

viveksahu26 commented 3 years ago

@ecordell, In the above pod example. As per my understanding, it's ok to provide volume to the container using the CSI driver. But why mounting it with kfox1111/misc:test image.

viveksahu26 commented 3 years ago

@ecordell , what is the benefits of mounting content of container image as a volume to a container.

adi0509 commented 3 years ago

This is really cool! I am interested in this and will be applying.

kfox1111 commented 3 years ago

Interesting. Thanks for working on this issue.

There's another driver being worked on that does cri located here: https://github.com/warm-metal/csi-driver-image

Some discussion around the image-populator dirver and the image driver here: https://github.com/warm-metal/csi-driver-image/issues/12

I think we all need to put our heads together for a bit and weigh the options.

I used buidah for the prototype implementation because it was portable and could use the image cache for multiple instances without consuming extra storage as well as being really simple to implement.

warm-metalcsi-driver-image used cri so that the cache could be used also, but shared with the runtime.

the cp variant described here would be portable too, but would not share any data with the image cache or between multiple instances.

I think the ideal solution would:

share the underlying storage with the runtime and between instances of volumes
works with any runtime
supports writable volumes where only the changed files consume extra space
would be best if it could be incorporated directly into k8s so it could be relied on always being there

kfox1111 commented 3 years ago

@viveksahu26 the the reason I think the image driver is very useful is two fold.

Its another way to distribute data other then a configmap/secret or bundling data with binaries. It lets you reuse all the container registry/mirroring/scanning/signing/etc tools for shipping data around just like the binaries.

For example, instead of building a container that starts with nginx, and adds your static website content to it, requiring a new container when nginx needs updating, you can deploy the nginx container directly,, then mount your content at /var/www/nginx/html. You can then update either container without updating the other. This is especially useful when you have something like rpm mirrors where you may want the host nginx container to support different architectures as the rpms inside.

Another example, nginx serving out rpm repos for different architectures. It would save needing to build many permutations like: host arch / rpm arch arm64 / arm64 arm64 / x86_64 x86_64 / arm64 x86_64 / x86_64

While if you had image volumes, youd nave: nginx for arm64, nginx for x86_64, and an rpm repo image for x86_64 and one for arm64. saving quite a bit of space.

A way to add additional modularity to assemble containers. For example, you could build a busybox image that has a statically linked binary. You could then mount that into any other container and use it without needing to change/extend the original container. This gives you more options to assemble things in pods without needing to build new containers.

glennpratt commented 3 years ago

Totally agree with @kfox1111

I operate an internal cp solution with a large number of pods and it is not working well:

Arranging to execute cp in the image means the image must be executable or you need to inject a statically linked binary. Both have annoying costs.
cp consumes a lot of IOPS, so much so that our nodes became degraded during node rotation as many new pods were executing cp simultaneously.
cp adds significant startup time (10-20 seconds for me) to pods compared to establishing a layer/snapshot on an image.
cp wastes the full size of the image for each Volume when we either need no changes (read-only) or very few (read-write snapshot).

We are testing https://github.com/warm-metal/csi-driver-image and so far using it for Pod Ephemeral RW Volumes is working well.

kitt1987 commented 3 years ago

I also agree with @kfox1111 and @glennpratt. With no extra data duplication and runtime overhead is a requisite.

There is no standard location or directory structure for an image on-disk across CRI implementations. This means that once an image is on a node, there's no standard way to get its contents.

Though there is no unified location, we can still found the position through CRI API ImageService.ImageFsInfo, like my project bind-host did.

The directory structure is various because of different container runtimes and their storage drivers. If we want to use those images, we need to know which kind of container runtime is running and how it saves images. The good thing is that there is/will be not so much runtime. I think this requirement is not so common especially on sub-popular runtime. We need not make working with any runtime as a goal. Currently, we've already known how to implement such a plugin on both containerd and cri-o.

An opposite and hard way, that may be mentioned by @kfox1111 in warm-metal/csi-driver-image#12, is to define new APIs and help runtime implement them.

And, csi-driver-image is going to support cri-o.

k8s-triage-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 3 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-addons/issues/100#issuecomment-932859961): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / cluster-addons