Enable multiple non-IP interface to be connected via tc redirect

networkop commented 3 years ago

Hi @darkowlzz @stealthybox 👋 This is the PR that covers https://github.com/weaveworks/ignite/issues/832 and https://github.com/weaveworks/ignite/issues/831. At a high level it has the following impact:

1. Introduced a new CL argument `--sandbox-env-vars` accepting a comma-separated list of key=value pairs.

These values are passed to the respective container runtimes and used as env variables. I've had a choice two either create a new API version and add env vars as a new spec field or pass them around in VM's annotations. I've opted for the second option to minimise the impact of this change. I'm not sure if it's a good idea, happy to change it if necessary.

2. Introduced a new bool arg called `wait` to `StartVM` function - if set to false, this bypasses the `waitForSpawn` check.

This flag defaults to true for all existing function invocations to preserve backwards compatibility. However, when used via API, users can set this to false and skip the check for ignite-spawn. The purpose is to get the container PID to configure additional interfaces before ignite-spawn is fully initialised.

3. Ignite-spawn can wait for a number of interfaces to be connected before firing up the VM.

This is controlled through an environment variable called IGNITE_INTFS. To preserve backwards compatibility it defaults to 1, so without any variables set, the behaviour is the same as now. However, if this value is set to 1 on higher, SetupContainerNetworking will wait for that number of interfaces to be connected (up to a maximum timeout).

4. Ignite-spawn will connect additional veth and tap interfaces via tc redirect.

For backwards compatibility, the behaviour is to always use the current way of interconnecting interfaces (via bridge). However, if there's no IP on the interface, it will be interconnected with a VM via tc redirect.

In general, all these changes strive to preserve the happy-path behavior of pre-existing code, so no major changes are expected for existing users.

networkop commented 3 years ago

It seems like a number of e2e tests are failing, however they are also failing on the main branch. I'm not sure if anyone is aware of that?

=== RUN   TestRunGitops
    gitops_test.go:134: assertion failed: error is not nil: exit status 1: cmd:
        "/home/null/ignite/bin/ignite rm -f my-vm"
        time="2021-05-11T15:14:54+01:00" level=fatal msg="unknown runtime \"\""

    gitops_test.go:134: failed to run "/home/null/ignite/bin/ignite rm -f my-vm": exit status 1

And then a number of lifecycle tests all fail when VMs are re-started, e.g.:

=== RUN   TestVMLifecycleWithDockerAndDockerBridge
    vm_lifecycle_test.go:52: assertion failed: error is not nil: exit status 1: cmd:
        "/home/null/ignite/bin/ignite start e2e-test-vm-lifecycle-docker-and-docker-bridge --runtime=docker --network-plugin=docker-bridge"
        time="2021-05-11T15:17:37+01:00" level=info msg="Networking is handled by \"docker-bridge\""
        time="2021-05-11T15:17:37+01:00" level=info msg="Started Firecracker VM \"999d41998dd1afa4\" in a container with ID \"7bce9d72f1fbbd8eb9b857dda35daf7eefec1d2faf7bff46c9b5d477341fdbf5\""
        time="2021-05-11T15:17:47+01:00" level=fatal msg="timeout waiting for ignite-spawn startup"

darkowlzz commented 3 years ago

It seems like a number of e2e tests are failing, however they are also failing on the main branch. I'm not sure if anyone is aware of that?

We are aware of some flaky tests. But these look new failures, haven't seen them fail before.

networkop commented 3 years ago

thanks @darkowlzz , I'm still working on this, so converted this to draft again. I'm also on PTO next week , so let's put this on hold until Monday week. I'll try to push out some updates by the 24th.

stealthybox commented 3 years ago

@networkop, thanks so much for working on this change. I'm really excited about it and you've inspired a really good use of annotations for low-cost experiments across ignite.

networkop commented 3 years ago

brilliant, thanks for your comments @darkowlzz and @stealthybox. I think all of your suggestions make perfect sense and I'll make the necessary changes later this week. Hope to see y'all next Monday.

networkop commented 3 years ago

Hi @darkowlzz @stealthybox , it was nice to chat to you today. Updated this PR, so most of your feedback got addressed. I think instead of re-writing the first comment I'll do a new PR summary below to preserve some history and context.

This PR...

Introduced two new optional annotations: a. ignite.weave.works/extra-intfs -- can contain a comma-separated list of extra interfaces to be connected to the sandbox environment (by an external process, e.g. containerlab) b. ignite.weave.works/sanbox-env -- can contain a list of comma-separated key-value pairs to be passed into sandbox as environment variables. This would allow for firecracker timeouts to be adjusted, e.g. [like here].(https://github.com/networkop/containerlab/blob/ignite/runtime/ignite/ignite.go#L209)
Introduced a new bool arg called wait to StartVM function - if set to false, this bypasses the waitForSpawn check. This flag defaults to true for all existing function invocations to preserve backwards compatibility. However, when used via API, users can set this to false and skip the check for ignite-spawn. The purpose is to get the container PID to configure additional interfaces before ignite-spawn is fully initialised.
Ignite-spawn can wait for a number of extra interfaces to be connected before firing up the VM. This is controlled through the ignite.weave.works/extra-intfs annotation. If present, the value of this annotation is parsed as comma-separated list of strings where each string is expected to match one of the netlink interfaces inside a sandbox. The VM will not start until all extra interfaces + main interface (eth0) are connected.
Ignite-spawn will connect additional veth and tap interfaces via tc redirect. For backwards compatibility, the behavior is to always use the current way of interconnecting interfaces (via bridge). However, if there's no IP on the interface, it will be interconnected with a VM via tc redirect.

I still haven't touched the tests or documentation, so please let me know what you'd expect on that front.

networkop commented 3 years ago

I had to make some extra changes to make e2e tests work. Mainly I had to expose the wait flag to allow start/run commands to allow a runtime to start in order to plug in additional interfaces.

networkop commented 3 years ago

@stealthybox @darkowlzz I'm not sure why the tests are failing, they pass locally. I don't have the permissions to re-run, can you trigger a re-run please?

stealthybox commented 3 years ago

Introduced a new bool arg called wait to StartVM function - if set to false, this bypasses the waitForSpawn check. This flag defaults to true for all existing function invocations to preserve backwards compatibility. However, when used via API, users can set this to false and skip the check for ignite-spawn. The purpose is to get the container PID to configure additional interfaces before ignite-spawn is fully initialised.

Talking through this with @darkowlzz, this has a subtle effect of changing the status start timestamp of a VM when the flag is disabled.

There are two approaches we came up with.

The one that is most similar to what you're wanting to do is to make waitForSpawn non-blocking and copy StartVM() to a new func called StartVMNonBlocking() which returns a done-channel. This would allow you as an API client to do what you want to the VM while it's starting (like add interfaces), and it could pave the way for a general CLI option to run some start-lifecycle handlers. In this case, for API compatibility, we would just wrap StartVMNonBlocking to keep the signature for the current blocking version.

The second option is to try and add formal lifecycle hooks that an API client could use. This feels much harder and sort of out of scope.

WDYT?

networkop commented 3 years ago

Introduced a new bool arg called wait to StartVM function - if set to false, this bypasses the waitForSpawn check. This flag defaults to true for all existing function invocations to preserve backwards compatibility. However, when used via API, users can set this to false and skip the check for ignite-spawn. The purpose is to get the container PID to configure additional interfaces before ignite-spawn is fully initialised.

Talking through this with @darkowlzz, this has a subtle effect of changing the status start timestamp of a VM when the flag is disabled.

There are two approaches we came up with.

The one that is most similar to what you're wanting to do is to make waitForSpawn non-blocking and copy StartVM() to a new func called StartVMNonBlocking() which returns a done-channel. This would allow you as an API client to do what you want to the VM while it's starting (like add interfaces), and it could pave the way for a general CLI option to run some start-lifecycle handlers. In this case, for API compatibility, we would just wrap StartVMNonBlocking to keep the signature for the current blocking version.

Thanks @stealthybox , I just want to make sure I understand what you're saying. I have two different interpretations:

First one:

Add two new functions StartVMBlocking and StartVMnonBlocking, each one calling StartVM with the wait bool either on or off.
Modify StartVM signature to func StartVM(vm *api.VM, debug, done<-chan struct{}) error, modify all other calling functions to wait for done channel.

Second one:

Leave the StartVM() function as it was before, add a new function StartVMNonBlocking() with the same code but accepting and extra <-chan bool argument.

stealthybox commented 3 years ago

Ah, I can explain more clearly. The proposed StartVM() wait flag has an undesirable consequence of changing/losing-track of the VM start time when you disable wait. This is seems like a roundabout solution.

What is needed is for an API client to run code in-between setupContainerNetworking and waitForSpawn.

I see this retrofit to StartVM:

create a VMStartChannels struct with a SpawnFinished struct member of type <-chan error
move the function body from StartVM() error to StartVMNonBlocking() (error, VMStartChannels)
modify the function body so that waitForSpawn is a non-blocking goroutine that returns any errors to the VMStartChannels.SpawnFinished channel
call StartVMNonBlocking from StartVM and wrap it with blocking behavior to preserve package API compatibility.

This would allow you to call StartVMNonBlocking from containerlab directly, perform your network interface mutations, and then block for the channel afterwards. It keeps the start timestamp behavior the same, and allows the containerlab code to safely handle any Spawn errors.

Using the VMStartChannels struct allows us to add more channels later on without modifying the function signature.

networkop commented 3 years ago

@darkowlzz @stealthybox PR is ready for review.

stealthybox commented 3 years ago

commit 219c8e89f2b80373cfbbf53de6c21101f24cb672
Author: leigh capili <leigh@null.net>
Date:   Mon Jun 21 10:34:48 2021 -0600

    Use set for interface modes

diff --git pkg/container/network.go pkg/container/network.go
index 2c28e99c..af00e47f 100644
--- pkg/container/network.go
+++ pkg/container/network.go
@@ -33,18 +33,20 @@ ip link set eth0 master br0
 ip link set vm0 master br0
 */

-// Array of container interfaces to ignore (not forward to vm)
+// ignoreInterfaces is a Set of container interface names to ignore (not forward to vm)
 var ignoreInterfaces = map[string]struct{}{
    "lo":    {},
    "sit0":  {},
    "tunl0": {},
 }

-var supportedModes = struct {
-   dhcp, tc string
-}{
-   dhcp: "dhcp-bridge",
-   tc:   "tc-redirect",
+const MODE_DHCP = "dhcp-bridge"
+const MODE_TC = "tc-redirect"
+
+// supportedModes is a Set of Mode strings
+var supportedModes = map[string]struct{}{
+   MODE_DHCP: {},
+   MODE_TC:   {},
 }

 var mainInterface = "eth0"
@@ -57,7 +59,7 @@ func SetupContainerNetworking(vm *api.VM) (firecracker.NetworkInterfaces, []DHCP

    // Setting up mainInterface if not defined
    if _, ok := vmIntfs[mainInterface]; !ok {
-       vmIntfs[mainInterface] = supportedModes.dhcp
+       vmIntfs[mainInterface] = MODE_DHCP
    }

    interval := 1 * time.Second
@@ -116,7 +118,7 @@ func collectInterfaces(vmIntfs map[string]string) (bool, error) {
        // default fallback behaviour to always consider intfs with an address
        addrs, _ := intf.Addrs()
        if len(addrs) > 0 {
-           vmIntfs[intf.Name] = supportedModes.dhcp
+           vmIntfs[intf.Name] = MODE_DHCP
        }
    }

@@ -127,7 +129,7 @@ func collectInterfaces(vmIntfs map[string]string) (bool, error) {
        }

        // for DHCP interface, we need to make sure IP and route exist
-       if mode == supportedModes.dhcp {
+       if mode == MODE_DHCP {
            intf := foundIntfs[intfName]
            _, _, _, noIPs, err := getAddress(&intf)
            if err != nil {
@@ -163,7 +165,7 @@ func networkSetup(fcIntfs *firecracker.NetworkInterfaces, dhcpIntfs *[]DHCPInter
        }

        switch vmIntfs[intfName] {
-       case supportedModes.dhcp:
+       case MODE_DHCP:
            ipNet, gw, err := takeAddress(intf)
            if err != nil {
                return fmt.Errorf("error parsing interface %q: %s", intfName, err)
@@ -185,7 +187,7 @@ func networkSetup(fcIntfs *firecracker.NetworkInterfaces, dhcpIntfs *[]DHCPInter
                    HostDevName: dhcpIface.VMTAP,
                },
            })
-       case supportedModes.tc:
+       case MODE_TC:
            tcInterface, err := addTcRedirect(intf)
            if err != nil {
                log.Errorf("Failed to setup tc redirect %v", err)
@@ -494,22 +496,20 @@ func maskString(mask net.IPMask) string {
 func parseExtraIntfs(vm *api.VM) map[string]string {
    result := make(map[string]string)

-   for k, v := range vm.GetObjectMeta().Annotations {
-       if !strings.HasPrefix(k, constants.IGNITE_INTERFACE_ANNOTATION) {
+   for intf, mode := range vm.GetObjectMeta().Annotations {
+       if !strings.HasPrefix(intf, constants.IGNITE_INTERFACE_ANNOTATION) {
            continue
        }

-       k = strings.TrimPrefix(k, constants.IGNITE_INTERFACE_ANNOTATION)
-       if k == "" {
+       intf = strings.TrimPrefix(intf, constants.IGNITE_INTERFACE_ANNOTATION)
+       if intf == "" {
            continue
        }

-       switch v {
-       case supportedModes.dhcp:
-           result[k] = v
-       case supportedModes.tc:
-           result[k] = v
-       default:
+       if _, ok := supportedModes[mode]; ok {
+           result[intf] = mode
+       } else {
+           log.Warnf("VM specifies unrecognized mode %q for interface %q", mode, intf)
            continue
        }

stealthybox commented 3 years ago

diff --git pkg/operations/start.go pkg/operations/start.go
index b553fc3c..06d868a1 100644
--- pkg/operations/start.go
+++ pkg/operations/start.go
@@ -153,30 +153,28 @@ func StartVMNonBlocking(vm *api.VM, debug bool) (*VMChannels, error) {
        log.Infof("Started Firecracker VM %q in a container with ID %q", vm.GetUID(), containerID)
    }

-   // TODO: This is temporary until we have proper communication to the container
-   go waitForSpawn(vm, vmChans)
-
    // Set the container ID for the VM
    vm.Status.Runtime.ID = containerID
    vm.Status.Runtime.Name = providers.RuntimeName

-   // Set the start time for the VM
-   startTime := apiruntime.Timestamp()
-   vm.Status.StartTime = &startTime
-   vm.Status.Network.Plugin = providers.NetworkPluginName
-
    // Append non-loopback runtime IP addresses of the VM to its state
    for _, addr := range result.Addresses {
        if !addr.IP.IsLoopback() {
            vm.Status.Network.IPAddresses = append(vm.Status.Network.IPAddresses, addr.IP)
        }
    }
+   vm.Status.Network.Plugin = providers.NetworkPluginName
+
+   // write the API object in a non-running state before we wait for spawn's network logic and firecracker
+   if err := providers.Client.VMs().Set(vm); err != nil {
+       return vmChans, err
+   }

-   // Set the VM's status to running
-   vm.Status.Running = true
+   // TODO: This is temporary until we have proper communication to the container
+   // It's best to perform any imperative changes to the VM object pointer before this go-routine starts
+   go waitForSpawn(vm, vmChans)

-   // Write the state changes
-   return vmChans, providers.Client.VMs().Set(vm)
+   return vmChans, nil
 }

 // verifyPulled pulls the ignite-spawn image if it's not present
@@ -202,12 +200,26 @@ func waitForSpawn(vm *api.VM, vmChans *VMChannels) {
    const timeout = 10 * time.Second
    const checkInterval = 100 * time.Millisecond

-   startTime := time.Now()
-   for time.Since(startTime) < timeout {
+   timer := time.Now()
+   for time.Since(timer) < timeout {
        time.Sleep(checkInterval)

        if util.FileExists(path.Join(vm.ObjectPath(), constants.PROMETHEUS_SOCKET)) {
-           vmChans.SpawnFinished <- nil
+           // Before we write the VM, we should REALLY REALLY re-fetch the API object from storage
+           vm, err := providers.Client.VMs().Get(vm.GetUID())
+           if err != nil {
+               vmChans.SpawnFinished <- err
+           }
+
+           // Set the start time for the VM
+           startTime := apiruntime.Timestamp()
+           vm.Status.StartTime = &startTime
+
+           // Set the VM's status to running
+           vm.Status.Running = true
+
+           // Write the state changes, send any errors through the channel
+           vmChans.SpawnFinished <- providers.Client.VMs().Set(vm)
            return
        }
    }

networkop commented 3 years ago

@darkowlzz @stealthybox I'm running into an issue after our latest waitForSpawn refactoring. Since we moved the vm.Status.Running = true into a goroutine, I can no longer ignore the vmChannels , i.e. I can't do:

_, err = operations.StartVMNonBlocking(so.vm, so.Debug)

because the VM status never changes to running and from the pov of ignite it remains in the following state:

VM ID                   IMAGE                           KERNEL                                  SIZE    CPUS    MEMORY          CREATED      STATUS          IPS             PORTS   NAME
a3710f7f8f76395e        weaveworks/ignite-ubuntu:latest weaveworks/ignite-kernel:5.4.108        3.0 GB  1       800.0 MB        32s ago      *Up <nil>       192.168.223.2           e2e-test-vm-multinet
WARN[0000] The symbol * on the VM status indicates that the VM manifest on disk may not be up-to-date with the actual VM status from the container runtime

I'm not sure what's the best way forward now. On the one hand, we can't move Running=true before waitForSpawn because the VM is not started, on the other hand, I need to return from the StartVMNonBlocking in order to connect additional interfaces.

In fact, thinking about it, waitForSpawn may never complete when invoked from the CLI and vmChannels are ignored, since when the main() completes, all of the child goroutines are cleaned up, so the final timestamp/state changes may never get done.

What I've done for now, is moved the vm.Status.Running = true back into the StartVMNonBlocking function. This would mean that the VM will transition to running state eariler but the timestamp will still be correct. It feels a bit wrong. Almost as if we need some new state, not just running true/false but something in the middle to indicate that it has been started but may not be running yet.

WDYT?

networkop commented 3 years ago

@darkowlzz @stealthybox can you have a look at the latest commit? I've re-worked one of the e2e tests to use ignite API instead of CLI. This would allow us to drop the --wait CLI flag (because of the above reasons) and wait for the vmChannels to return before doing exec. If you think it looks alright, I'll update the remaining two tests in a similar way.

stealthybox commented 3 years ago

@networkop I think this is a good way forward for the tests 👍

Perhaps we could copy the waitForSpawn code into containerlab? I think the better structural change is to support the channels in containerlab.

networkop commented 3 years ago

cool, I think this is it then, we've got all the changes from yesterday merged. two major deviations are

dropped the --wait flag in CLI
refactored e2e tests to use API

what do y'all think?

networkop commented 3 years ago

@stealthybox maybe, I had an idea of adding an extra func to clab's runtime interface, something like waitForWarmup that will make sure all of the individual containers have started. I need to see how it will work with the rest of the runtimes.

stealthybox commented 3 years ago

Nice Shall I purchase some ice-cream cake to celebrate?

networkop commented 3 years ago

brilliant! thanks @darkowlzz and @stealthybox for all the support 🙏

weaveworks / ignite