rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
5.88k stars 275 forks source link

Epic: Incorporate gvisor into Rancher Desktop's networking stack #3810

Closed Nino-K closed 1 year ago

Nino-K commented 1 year ago

As part of a continuous effort to make Rancher Desktop's networking stack more robust, we are considering to integrate gvisor's networking layer into our project. One of the main goals of this integration to consider is consistent networking layer implementation across all the offered platforms (Win, MacOs, Linux). This integration allows for a more manageable code base along with ease of feature development.

Previously, we have implemented processes (on WSL) to tackle some of the issues we were seeing with DNS and VPN. This work architecturally aligns with some of the work that has been done in past for Rancher Desktop Host Resolver and Vtunnel. We will be leveraging AF_VSOCK as a main communication bus between the host and the VM.

In this newly proposed architecture, we will have two main processes. One runs in the VM (vm-switch) the other on the Host (host-switch.exe). Both of these processes will be maintained under the same project (Rancher Desktop Switch or Rancher Desktop Networking, open to other suggestions).

The two processes will be communicating over AF_VSOCK protocol. The VM or VMSwitch will be picking up all the traffic that is destined for a tap device that will be initiated upon startup and forwards the traffic as ethernet frames to the host daemon or HostSwitch. The HostSwitch then reconstructs the eth frames and hands them off to system libraries as syscalls. Also, the HostSwitch will be responsible for maintaining both internal (host to VM) and external (host to the internet) connections.

Furthermore, the host Deamon (HostSwitch) will also be acting as a DNS server (more precisely a stub resolver).

The main focus areas for this newly proposed architecture are as follows:

The below diagram can demonstrate the communication flow between the host and the VM:

flowchart  LR;
 subgraph Host["HOST"]
 subgraph hostSwitch["Host Switch"]
 vsockHost{"Host Deamon \nListens for Incoming connection"}
 eth(("reconstruct ETH frames"))
 syscall(("OS syscall"))
 dhcp["DHCP"]
 dns["DNS"]
 api["API"]
 portForwarding["Port Forwarding"]
 vsockHost <----> eth
 eth <----> syscall
 vsockHost ----> dhcp
 vsockHost ----> dns
 vsockHost ----> portForwarding
 vsockHost ----> api
 end
 end
 subgraph VM["VM"]
 subgraph vmSwitch["VM Switch"]
 vsockVM{"VM Deamon"}
 ethVM(("listens for ETH frames\n from TAP Device"))
 tapDevice("eth1")
 tapDevice <----> ethVM
 ethVM <----> vsockVM
 end
 end
 vsockVM  <---> |AF_VSOCK| vsockHost

More up to date diagrams can be found here: https://github.com/rancher-sandbox/rancher-desktop-networking

Stories

Release notes

TBD

Documentation

Please note that this is an experimental feature and will be available upon 1.8.0 release of Rancher Desktop. This feature is currently only available on windows and it is meant to change the underlying networking mechanism that is used by Rancher Desktop. Once enabled, it can tackle some of the historical DNS/Routing issues that were observed by our users when using Rancher Desktop behind corporate VPNs.

This feature can be enabled using the rdctl set command. E.g:

rdctl set --experimental.virtual-machine.networking-tunnel=true

Once the command is executed the settings.json should be populated with the correct setting for this feature.

C:\Users\[UserName]\AppData\Roaming\rancher-desktop\settings.json

The networkingTunnel configuration is currently nested under the experimental.virtualMachine parent object, below are sample settings that demonstrate the networkingTunnel when enabled.

{
  "version": 6,
  "containerEngine": {
    "name": "moby",
    "allowedImages": {
      "enabled": false,
      "locked": false,
      "patterns": []
    }
  },
  "kubernetes": {
    "version": "1.25.6",
    "port": 6443,
    "enabled": true,
    "options": {
      "traefik": true,
      "flannel": true
    }
  },
  "portForwarding": {
    "includeKubernetesServices": false
  },
  "images": {
    "showAll": true,
    "namespace": "k8s.io"
  },
  "diagnostics": {
    "showMuted": false,
    "mutedChecks": {}
  },
  "application": {
    "adminAccess": true,
    "debug": false,
    "pathManagementStrategy": "notset",
    "telemetry": {
      "enabled": true
    },
    "updater": {
      "enabled": false
    },
    "autoStart": false,
    "startInBackground": false,
    "hideNotificationIcon": false,
    "window": {
      "quitOnClose": false
    }
  },
  "virtualMachine": {
    "hostResolver": true,
    "memoryInGB": 2,
    "numberCPUs": 2
  },
  "experimental": {
    "virtualMachine": {
      "networkingTunnel": true, // <--- this is the setting
      "socketVMNet": false,
      "mount": {
        "type": "reverse-sshfs",
        "9p": {
          "securityModel": "none",
          "protocolVersion": "9p2000.L",
          "msizeInKB": 128,
          "cacheMode": "mmap"
        }
      }
    }
  },
  "WSL": {
    "integrations": {}
  },
  "autoStart": false,
  "startInBackground": false,
  "hideNotificationIcon": false,
  "window": {
    "quitOnClose": false
  }
}

When networkingTunnel is enabled, a separate network namespace is created in rancher-desktop WSL distro. The namespace is configured with an appropriate network interface to forward all the traffic that is destined for the eth0 within the namespace to the host, making it seem as though the traffic originated from the host. This will allow VPN clients to handle the routing issue that existed previously while using corporate VPNs.

Important Note: Please note that for 1.8 release when this experimental feature is enabled the port forwarding has to be performed manually. For more details please take a look at this issue. For e.g:

when you are required to expose a port:

docker run --name mynginx1 -p 8801:80 -d nginx

You can now manually expose:

rdctl shell curl http://192.168.127.1:80/services/forwarder/expose -X POST -d '{\"local\":\":8801\",\"remote\":\"192.168.127.2:8801\"}'

And unexposed:

rdctl shell curl http://192.168.127.1:80/services/forwarder/unexpose -X POST -d '{\"local\":\":8801\",\"remote\":\"192.168.127.2:8801\"}'

Furthermore, the WSL integration and additional features offered by the CLI server are not going to be enabled. However, these features should be enabled again in our upcoming releases.

gaktive commented 1 year ago

We'll need other tickets to make this exposed as experimental and then handle the network namespacing for WSL. We also need to flesh out the docs side of this some more.