weaveworks / ignite

Ignite a Firecracker microVM
https://ignite.readthedocs.org
Apache License 2.0
3.49k stars 226 forks source link

How to re-run stopped VMs? #504

Open shinebayar-g opened 4 years ago

shinebayar-g commented 4 years ago
  1. Why reboot VMs doesn't work?
  2. How can I re-start stopped VMs?
    # sudo ignite vm start my-vm
    FATA[0000] failed to start container for VM "989923904ae4eb43": task must be stopped before deletion: running: failed precondition
stealthybox commented 4 years ago

Thanks for reporting. This is probably a bug. We should write an e2e test and see when it was introduced.

kobayashi commented 4 years ago

Which ignite version are you using? For my environment, stop/start tasks are working fine.

$ sudo ignite run weaveworks/ignite-ubuntu --ssh --name my-vm
INFO[0001] Created VM with ID "ba6509c56933aa45" and name "my-vm" 
INFO[0001] Networking is handled by "cni"               
INFO[0001] Started Firecracker VM "ba6509c56933aa45" in a container with ID "ignite-ba6509c56933aa45" 
$ sudo ignite stop my-vm
INFO[0000] Removing the container with ID "ignite-ba6509c56933aa45" from the "cni" network 
INFO[0000] Stopped VM with name "my-vm" and ID "ba6509c56933aa45" 
$ sudo ignite start my-vm
INFO[0000] Networking is handled by "cni"               
INFO[0000] Started Firecracker VM "ba6509c56933aa45" in a container with ID "ignite-ba6509c56933aa45" 
$ sudo ignite vm ps
VM ID           IMAGE               KERNEL                  SIZE    CPUS    MEMORYCREATED   STATUS  IPS     PORTS   NAME
ba6509c56933aa45    weaveworks/ignite-ubuntu:latest weaveworks/ignite-kernel:4.19.47    4.0 GB  1   512.0 MB    23s ago Up 8s   10.61.0.32      my-vm
$ sudo ignite version
Ignite version: version.Info{Major:"0", Minor:"6+", GitVersion:"v0.6.0-235+4b20e4f5cd8c58-dirty-dirty", GitCommit:"4b20e4f5cd8c582daf7fb8cbc0359d3ccab6b5bb", GitTreeState:"dirty", BuildDate:"2020-04-20T20:29:28Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"linux/amd64"}
Firecracker version: v0.21.1
Runtime: containerd
stealthybox commented 4 years ago

Thanks for confirming on recent versions of ignite @kobayashi.

I think this issue may be related to firecracker's special handling of os reboots. If I recall correctly, reboots simply shutdown the guest.

It would be nice if you could ignite vm start these guests again. That doesn't look to work right now using a build off of master, although the error message I get is about IP allocation which is different from the @shinebayar-g's:

sudo ignite run weaveworks/ignite-ubuntu \
          --name test-reboot --ssh
INFO[0000] Created VM with ID "d81cff4cad6db24c" and name "test-reboot"
INFO[0001] Networking is handled by "cni"
INFO[0001] Started Firecracker VM "d81cff4cad6db24c" in a container with ID "ignite-d81cff4cad6db24c"

sudo ignite exec test-reboot echo hi
hi

sudo ignite exec test-reboot reboot
ERRO[0000] failed to run shell command: wait: remote command exited without exit status or exit signal

sudo ignite exec test-reboot echo down
FATA[0000] VM "d81cff4cad6db24c" is not running

sudo ignite vm start test-reboot
ERRO[0000] failed to setup network for namespace "ignite-d81cff4cad6db24c": failed to allocate for range 0: 10.61.0.3 has been allocated to ignite-d81cff4cad6db24c, duplicate allocation is not allowed
FATA[0000] failed to allocate for range 0: 10.61.0.3 has been allocated to ignite-d81cff4cad6db24c, duplicate allocation is not allowed

sudo ignite vm rm test-reboot
INFO[0000] Removing the container with ID "ignite-d81cff4cad6db24c" from the "cni" network
INFO[0000] Removed VM with name "test-reboot" and ID "d81cff4cad6db24c"
ignite version
Ignite version: version.Info{Major:"0", Minor:"6+", GitVersion:"v0.6.0-264+ae1cd8a48d9372", GitCommit:"ae1cd8a48d937235f0e36923a5bbb0028d02d5d4", GitTreeState:"clean", BuildDate:"2020-05-18T23:34:12Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"linux/amd64", SandboxImage:version.Image{Name:"weaveworks/ignite", Tag:"v0.6.0-264-ae1cd8a48d9372", Delimeter:":"}, KernelImage:version.Image{Name:"weaveworks/ignite-kernel", Tag:"4.19.47", Delimeter:":"}}
Firecracker version: v0.21.1
Runtime: containerd
stealthybox commented 4 years ago

Containerd lifecycle is still failing with the most recent network lifecycle change. (c17a99c7bbff8b9d1e96594e5f6356de61bb98fd) On the ignite dev call, we discussed checking in some e2e tests with the goal of fixing these issues.

docker+docker-bridge and docker+CNI are working better in most state changes so we can test against them already and identify behavioral differences.

Ignite run > stop > start > stop > start

sudo bin/ignite run weaveworks/ignite-ubuntu --ssh --name my-vm
INFO[0001] Created VM with ID "98da4c9faf220af1" and name "my-vm"
INFO[0001] Networking is handled by "cni"
INFO[0001] Started Firecracker VM "98da4c9faf220af1" in a container with ID "ignite-98da4c9faf220af1"
INFO[0002] Waiting for the ssh daemon within the VM to start...

sudo bin/ignite stop my-vm
INFO[0000] Removing the container with ID "ignite-98da4c9faf220af1" from the "cni" network
INFO[0001] Stopped VM with name "my-vm" and ID "98da4c9faf220af1"

sudo bin/ignite start my-vm
INFO[0000] Networking is handled by "cni"
INFO[0000] Started Firecracker VM "98da4c9faf220af1" in a container with ID "ignite-98da4c9faf220af1"
FATA[0010] timeout waiting for ignite-spawn startup

sudo bin/ignite stop my-vm
WARN[0000] VM "98da4c9faf220af1" is not running but trying to cleanup networking for stopped container
INFO[0000] Removing the container with ID "ignite-98da4c9faf220af1" from the "cni" network
WARN[0000] Failed to cleanup networking for stopped container VM "98da4c9faf220af1": failed to Statfs "/proc/5765/ns/net": no such file or directory
FATA[0000] failed to Statfs "/proc/5765/ns/net": no such file or directory

sudo bin/ignite start my-vm
ERRO[0000] failed to setup network for namespace "ignite-98da4c9faf220af1": failed to allocate for range 0: 10.61.0.21 has been allocated to ignite-98da4c9faf220af1, duplicate allocation is not allowed
FATA[0000] failed to allocate for range 0: 10.61.0.21 has been allocated to ignite-98da4c9faf220af1, duplicate allocation is not allowed

Out-of-band/VM-internal reboot

sudo bin/ignite run weaveworks/ignite-ubuntu --ssh --name test-reboot2

sudo bin/ignite exec test-reboot2 echo hi
hi

sudo bin/ignite exec test-reboot2 reboot

sudo bin/ignite exec test-reboot2 echo down
FATA[0000] VM "dfed6c8f745a1833" is not running

sudo bin/ignite vm start test-reboot2
ERRO[0000] failed to setup network for namespace "ignite-dfed6c8f745a1833": failed to allocate for range 0: 10.61.0.19 has been allocated to ignite-dfed6c8f745a1833, duplicate allocation is not allowed
FATA[0000] failed to allocate for range 0: 10.61.0.19 has been allocated to ignite-dfed6c8f745a1833, duplicate allocation is not allowed

sudo bin/ignite vm stop test-reboot2
WARN[0000] VM "dfed6c8f745a1833" is not running but trying to cleanup networking for stopped container
INFO[0000] Removing the container with ID "ignite-dfed6c8f745a1833" from the "cni" network

sudo bin/ignite vm start test-reboot2
FATA[0000] failed to start container for VM "dfed6c8f745a1833": task must be stopped before deletion: running: failed precondition

sudo bin/ignite stop test-reboot2
WARN[0000] VM "dfed6c8f745a1833" is not running but trying to cleanup networking for stopped container
INFO[0000] Removing the container with ID "ignite-dfed6c8f745a1833" from the "cni" network
WARN[0000] Failed to cleanup networking for stopped container VM "dfed6c8f745a1833": failed to Statfs "/proc/5021/ns/net": no such file or directory
FATA[0000] failed to Statfs "/proc/5021/ns/net": no such file or directory

sudo bin/ignite start test-reboot2
INFO[0000] Networking is handled by "cni"
INFO[0000] Started Firecracker VM "dfed6c8f745a1833" in a container with ID "ignite-dfed6c8f745a1833"
FATA[0010] timeout waiting for ignite-spawn startup