weaveworks / ignite

Ignite a Firecracker microVM
https://ignite.readthedocs.org
Apache License 2.0
3.49k stars 226 forks source link

race condition in downloading kernels #744

Open lukemarsden opened 3 years ago

lukemarsden commented 3 years ago

My ignite host got into this state just by starting some VMs concurrently when the kernel image hadn't been downloaded yet:

ubuntu@ns1003380:~/testfaster/backend$ sudo ignite kernel list                                                                                                                                                                           
KERNEL ID               NAME                                    CREATED SIZE    VERSION
5c8a25f05dcda98d        quay.io/testfaster/ignite-kernel:latest 3h38m   62.1 MB 5.4.43
d10326e42643db25        quay.io/testfaster/ignite-kernel:latest 3h38m   62.1 MB 5.4.43

Then of course no VMs can start because of this error:

2020/12/09 11:52:06 pool-9601c0df845c4a76be90a1aa1bef3c3f3368ef516dff7238fdb07a7e34887f5b: time="2020-12-09T11:52:06Z" level=fatal msg="ambiguous kernel query: \"quay.io/testfaster/ignite-kernel:latest\" matched the following IDs/names: quay.io/testfaster/ignite-kernel:latest, quay.io/testfaster/ignite-kernel:latest"

Ignite really shouldn't allow multiple kernels to be created with the same name, nor images, even if they are being downloaded/injected simultaneously (overlapping/in parallel with eachother).

Ignite version: version.Info{Major:"0", Minor:"8", GitVersion:"v0.8.0", GitCommit:"77f6859fa4f059f7338738e14cf66f5b9ec9b21c", GitTreeState:"clean", BuildDate:"2020-11-09T20:50:50Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"linux/amd64", SandboxImage:version.Image{Name:"weaveworks/ignite", Tag:"v0.8.0", Delimeter:":"}, KernelImage:version.Image{Name:"weaveworks/ignite-kernel", Tag:"4.19.125", Delimeter:":"}}
Firecracker version: v0.21.1
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"
hunhoffe commented 3 years ago

I've seen similar states (ambiguous kernel, and ambiguous image) emerge from concurrent commands with v0.7.1 (I haven't updated to 0.8.0 yet). I would be very grateful if this behavior was fixed, because it was causing me some grief!

darkowlzz commented 3 years ago

Thanks for reporting this issue. This is a known issue and a limitation of ignite's client only model. In the past, we discussed about this issue in the weekly dev calls. For other concurrency related issues, for example, where a resource that's required to create VM is busy, we create a lock file to coordinate multiple VM creation processes. But in case of images, if we follow the same model, we'll have to create a lock file for every unique image to avoid multiple processes pulling the same image. The current ignite image store has some drawbacks and we are looking for a good solution to solve such problems. One possible solution could be a client-server model. Once we have a server/daemon component that handles most of the ignite back-end work, such problems will go away.

Related issue: https://github.com/weaveworks/ignite/issues/559

stealthybox commented 3 years ago

Just to state things clearly, the current workaround for this is to import all dependent OS and kernel images before you start your concurrent workloads.

ex:

ignite kernel import quay.io/testfaster/ignite-kernel:latest
ignite image import weaveworks/ignite-ubuntu:20.04

ignite run weaveworks/ignite-ubuntu:20.04 --name vm-1 --kernel-image quay.io/testfaster/ignite-kernel:latest  &
ignite run weaveworks/ignite-ubuntu:20.04 --name vm-2 --kernel-image quay.io/testfaster/ignite-kernel:latest  &
ignite run weaveworks/ignite-ubuntu:20.04 --name vm-3 --kernel-image quay.io/testfaster/ignite-kernel:latest  &
ignite run weaveworks/ignite-ubuntu:20.04 --name vm-4 --kernel-image quay.io/testfaster/ignite-kernel:latest  &