weaveworks / ignite

Ignite a Firecracker microVM
https://ignite.readthedocs.org
Apache License 2.0
3.49k stars 224 forks source link

SIGSEGV with metadata from older version of ignite #854

Closed juozasg closed 3 years ago

juozasg commented 3 years ago

Signed-off-by: Juozas juozasgaigalas@gmail.com

Running ignite ps on a system with firecracker VMs created by ignite 0.7 causes a segfault because the /var/lib/firecracker/vm metadata does not have the expected vm.Status.Runtime.Name field. Instead ignore VMs missing this field.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x1083c5b]

goroutine 1 [running]:
github.com/weaveworks/ignite/cmd/ignite/run.fetchLatestStatus(0xc00007d860, 0x2, 0x2, 0x1, 0x2, 0xc00007d860, 0x1)
    /go/src/github.com/weaveworks/ignite/cmd/ignite/run/ps.go:201 +0x11b
github.com/weaveworks/ignite/cmd/ignite/run.Ps(0xc00003af40, 0x0, 0x0)
    /go/src/github.com/weaveworks/ignite/cmd/ignite/run/ps.go:81 +0x2bc
github.com/weaveworks/ignite/cmd/ignite/cmd/vmcmd.NewCmdPs.func1.1(0xc0001964e0, 0x131db72, 0x2)
    /go/src/github.com/weaveworks/ignite/cmd/ignite/cmd/vmcmd/ps.go:60 +0x5f
github.com/weaveworks/ignite/cmd/ignite/cmd/vmcmd.NewCmdPs.func1(0xc000453340, 0x1c436e0, 0x0, 0x0)
    /go/src/github.com/weaveworks/ignite/cmd/ignite/cmd/vmcmd/ps.go:61 +0x7a
github.com/spf13/cobra.(*Command).execute(0xc000453340, 0x1c436e0, 0x0, 0x0, 0xc000453340, 0x1c436e0)
    /go/src/github.com/weaveworks/ignite/vendor/github.com/spf13/cobra/command.go:846 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0xc00051b8c0, 0xc000182000, 0x14892e0, 0xc00000e010)
    /go/src/github.com/weaveworks/ignite/vendor/github.com/spf13/cobra/command.go:950 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
    /go/src/github.com/weaveworks/ignite/vendor/github.com/spf13/cobra/command.go:887
main.Run(0x112f160, 0xc000040178)
    /go/src/github.com/weaveworks/ignite/cmd/ignite/ignite.go:24 +0xab
main.main()
    /go/src/github.com/weaveworks/ignite/cmd/ignite/ignite.go:13 +0x25
kingdonb commented 3 years ago

Thanks for uncovering this issue!

I don't have a lot of context here, but I think there must be a better way to handle the default case than to just move on to the next one. Probably should emit some kind of warning since there could be no other way for the user to discover these are stale resources left behind.

darkowlzz commented 3 years ago

Hi @juozasg , thanks for reporting this issue. I was able to reproduce this issue. Good catch! At first, I thought that this is a bug in the API conversion. We have API conversion logic to handle reading old API objects and converting them to the latest API object version, but in this case, we missed adding a default value for status.Runtime.Name in the conversion from v1alpha2 to the hub version (latest). So I solved the issue with the following patch

diff --git a/pkg/apis/ignite/v1alpha2/conversion.go b/pkg/apis/ignite/v1alpha2/conversion.go
index ffea1cc6..bde5004b 100644
--- a/pkg/apis/ignite/v1alpha2/conversion.go
+++ b/pkg/apis/ignite/v1alpha2/conversion.go
@@ -1,8 +1,10 @@
 package v1alpha2

 import (
-       "github.com/weaveworks/ignite/pkg/apis/ignite"
        "k8s.io/apimachinery/pkg/conversion"
+
+       "github.com/weaveworks/ignite/pkg/apis/ignite"
+       "github.com/weaveworks/ignite/pkg/runtime"
 )

 // Convert_ignite_Runtime_To_v1alpha2_Runtime calls the autogenerated conversion function along with custom conversion logic
@@ -20,6 +22,10 @@ func Convert_v1alpha2_VMStatus_To_ignite_VMStatus(in *VMStatus, out *ignite.VMSt
                out.Network = &ignite.Network{}
        }

+       if out.Runtime.Name == "" {
+               out.Runtime.Name = runtime.RuntimeContainerd
+       }
+
        // Set IPAddresses to the new position, under Network block.
        out.Network.IPAddresses = in.IPAddresses

This adds containerd as the default runtime when the runtime name is missing in the status. But this may not be accurate because we don't have a way to determine the right value of the runtime, since we didn't record it in the old API. So, although it fixes the issue, it's not so correct solution. After thinking more about it and going through the ps code, I think a better solution would be to update https://github.com/weaveworks/ignite/blob/2dbcdd6637277eaef0d253a8c1774e5736c6289c/cmd/ignite/run/ps.go#L166-L167 and ignore the VMs that don't have runtime name info. Tried it with the patch:

diff --git a/cmd/ignite/run/ps.go b/cmd/ignite/run/ps.go
index 84cd2917..3e3b5315 100644
--- a/cmd/ignite/run/ps.go
+++ b/cmd/ignite/run/ps.go
@@ -163,8 +163,8 @@ func fetchLatestStatus(vms []*api.VM) (outdatedVMs map[string]bool, errList []er

        // Iterate through the VMs, fetching the actual status from the runtime.
        for _, vm := range vms {
-               // Skip VMs with no runtime info or no runtime ID.
-               if vm.Status.Runtime == nil || vm.Status.Runtime.ID == "" {
+               // Skip VMs with insufficient runtime info.
+               if vm.Status.Runtime == nil || vm.Status.Runtime.ID == "" || vm.Status.Runtime.Name == "" {
                        continue
                }
                containerID := vm.Status.Runtime.ID

I think this is the best way to solve this in the context of ps. It would be great if you could try the same, see if it works for you and update your patch accordingly.

stealthybox commented 3 years ago

I agree that it's impossible to insinuate with the older VM API objects what the runtime value should be. Asserting the default when converting to/from v1alpha2 is wrong.

I like the graceful degradation suggested for the ignite ps code path. :+1:

stealthybox commented 3 years ago

Thanks for finding this problem!