warm-metal / container-image-csi-driver

Kubernetes CSI driver for mounting image
MIT License
30 stars 22 forks source link

Status of this project #12

Closed sgandon closed 9 months ago

sgandon commented 3 years ago

Sorry for using an issue to ask questions but the "Github Discussions" are not enabled on this repository. We (as a company) are very interesting in this project and we have a use case where we have some component that are packaged as docker images and that need to be shared between many many pods. And we want to keep the flexibility to update those images regardless of their consuming pods (this is acceptable to require a restart though). We have no yet found any satisactory solution to this use case and this projet is promising. We have also looked at the https://github.com/kubernetes-csi/csi-driver-image-populator project but it is clearly stated as experimental. I wanted to know what is the status of this projet ? is it being used somehow ? will there be any releases ? Thanks for your answers.

kfox1111 commented 3 years ago

I'd be interested in the answer too. Also, I wrote the initial implementation of https://github.com/kubernetes-csi/csi-driver-image-populator

I haven't had too much time to work on it of late though, which is why its still experimental. I'm mostly a user, and mostly just want the feature to "just work" too. But wasn't gaining any traction until I started coding something.

I'm curious if this project and https://github.com/kubernetes-csi/csi-driver-image-populator could be merged ultimately leading to a universally useful stable driver.

The image-populator version just uses buildah at the moment, and I spent a couple of kubecons trying to chat with the maintainers of cri to see if cri could be extended with enough api to allow image-populator to be neutral to the runtime. I didn't get a clear answer, but the clearest answer was that cri wouldn't gain the extra features, and maybe a new abstraction api should be written.

So that I think kind of where we are. image-populator is neutral from the standpoint that it works with any runtime. but kind of duplicates runtimes which isn't ideal. This driver looks nice in that it uses cri, but still has runtime specific bits so probably won't work right on kata or crio or whatever?

Maybe we just need to add a plugable driver/api and then implement a few drivers? We could implement a buildah fallback driver for those runtimes that don't otherwise have a driver?

kitt1987 commented 3 years ago

Hi @sgandon, nice to meet you here. Questions are welcome, don't worry.

what is the status of this project?

This project is one of my side projects. It doesn't belong to any company or team yet.

This plugin supports the core function of my another side project kubectl-dev, so, it is not experimental. And, I will spend at least half of the time on these projects.

From the view of features, the plugin now only supports containerd, and may consider supporting podman in the future.

is it being used somehow?

Yes, but rarely. Another contributor @glennpratt is helping with the test, but we haven't discussed how he wants to use it.

My critical scenario is to debug images, that is to mount a failed image to a pod that usually provides bash or zsh. Users can run into that pod and figure causes out, such that, the mounted image should be writable and changes can be discarded after unmounting. (I am also considering implementing the snapshot interface to allow users to commit changes to a new image.) This case is also special in that it assumes only one user is debugging a specified image at the same time, so sharing the same mutable snapshot becomes a corner case.

will there be any releases?

Yes, but I can't guarantee that the first release will come in the next few weeks or months.

kitt1987 commented 3 years ago

@kfox1111, glad to see you here. Project csi-driver-image-populator was also my candidate before starting this project.

It is almost 3am now, I think I can only reply you a few hours later.

kitt1987 commented 3 years ago

I'm back. @kfox1111

I just read the code of the project csi-driver-image-populator and buildah from, and tried to remember why I started this project instead of using csi-driver-image-populator. Finally, I did it.

The most important reason is that buildah can not share the image store with container runtimes, which means I must have an image registry to make it work. And intuitively, images should be shared. My expected workflow is: build image -> run the image -> get failures -> mount the image and debug -> exit the debugger pod -> fix bugs and build a new image. It doesn't make sense to ask users to have a registry and always push&pull unstable images. So, I decided to build a new plugin to meet my needs.

probably won't work right on kata or cri-o or whatever?

Docker and containerd provide enough APIs to achieve my goals, and both of them are popular enough, so they are supported first. Other runtimes are to be supported as well, at least cri-o/podman. Though I haven't started yet, it may be difficult or even impossible because it is likely only CRI can be used.

Maybe we just need to add a plugable driver/api and then implement a few drivers? We could implement a buildah fallback driver for those runtimes that don't otherwise have a driver?

buildah is good enough as a driver to share read-only images. Whether the pluggable drivers are useful may depend on how many drivers can be invented.

kfox1111 commented 3 years ago

I believe buildah shares the image store with container runtimes, but only with crio at the moment. Totally agree that is sub-optimal for your use case. Really, I used buildah to do the implementation just because it took very little code to actually get it to proof of concept the idea of using an image as a volume. I also wanted to get the driver out as fast as possible so hopefully it would get wide spread and then helm chart writers would start taking advantage of the benefits using containers to transfer data/scripts could have. Leaving it marked experimental didn't really let that happen...

full cri support for the features would really be best, but like I said before, there wasn't really an option for that. This project seems to be going down the right path towards as portable as possible with the cri usage which is good. just maybe need some kind of driver mechanism for additional runtimes.

There was a third option kicked around that maybe we should explore too. cri by itself isn't able to do what we need I think, but maybe we could go up to a higher level and solve it still portably amongst runtimes. The idea was, maybe we add a controller that looks for pods with the image driver specified. It then launches another pod in a driver namespace on that node that corresponds to the "volume" with a busybox or something injected to get it to run, or maybe it mutates the users pod to add an extra container to the run for the volume. Then the driver takes the root mount of the extra container and bind mounts it to the driver location so can be picked up by the pod. Maybe there is a useful idea there, maybe not.

FrenchBen commented 2 years ago

bump if there have been any other discussions for this in sig-storage?

kitt1987 commented 2 years ago

bump if there have been any other discussions for this in sig-storage?

I guess no.

kfox1111 commented 2 years ago

not that I've heard. would really like to see a production grade image based csi driver become standardized.

kitt1987 commented 2 years ago

Maybe not so many people need this kind of driver.

ensonic commented 2 years ago

Btw. I think the comment from Apr 22, 2021 could do into the README.

blkperl commented 1 year ago

For those wondering about the production readiness of this project, we've been running it on nine production clusters for over a year now with the largest cluster having 272 nodes with 4721 pods using the driver.

webD97 commented 1 year ago

A project with a driver like this would also enable important use cases in our project. I would love to see it evolving further and I'd happily contribute (would need to get familiar with CSI stuff first though).

It's good to hear that there's also production experience already :)