lima-vm / lima

Linux virtual machines, with a focus on running containers
https://lima-vm.io/
Apache License 2.0
15.04k stars 588 forks source link

Enable guest-to-guest networking by default, with (gvisor-based) usermode networking #1222

Closed AkihiroSuda closed 1 year ago

AkihiroSuda commented 1 year ago

Currently the guests cannot talk to each other because each of them has the dedicated libslirp stack.

Lima should launch a (gvisor-based) usermode networking daemon and connect the guests to it by default so as to provide guest-to-guest networking. The usermode networking daemon has to provide DHCP for the IP range that can be configured via ~/.lima/_config/networks.yaml.

In addition, the daemon may provide virtual HTTP/SOCKS proxy (and/or other VPN-ish stuff) to allow connecting to the guest IP from the host. ( Similar to https://norouter.io/docs/getting-started/vpn/ )

We can even consider connecting the network to IaaS instances. That will be a lot of fun.

Alternative to

balajiv113 commented 1 year ago

Are we looking into creating a new network (gvisor based) with these support or using gvisor-tap-vsock ??

gvisor-tap-vsock supports multiple guest connections. Proxy alone is not supported i believe

AkihiroSuda commented 1 year ago

Are we looking into creating a new network (gvisor based) with these support or using gvisor-tap-vsock ??

Either would be fine

gvisor-tap-vsock supports multiple guest connections.

But doesn't seem supporting to use SOCK_STREAM (for QEMU) and SOCK_DGRAM (for VZ) sockets simultaneously?

balajiv113 commented 1 year ago

But doesn't seem supporting to use SOCK_STREAM (for QEMU) and SOCK_DGRAM (for VZ) sockets simultaneously?

True true, some improvements will be needed on their repo.

Code-Hex commented 1 year ago

@AkihiroSuda I'm currently working on something like vmnet based on gvisor. This was developed because of the slow NAT of the Virtualization.framework.

https://github.com/Code-Hex/gvisor-vmnet

Although not yet documented, it supports these features.

However, so far it has only been tested with vz and has not been tested to see if it works with Qemu. I shared this information because I'm happy to use it in lima-vm. There are still a few things to fix, but I would appreciate the feedback.

gvisor-vmnet

スクリーンショット 2022-12-11 18 57 38

Virtualization.framework NAT

スクリーンショット 2022-12-11 19 00 35
AkihiroSuda commented 1 year ago

gvisor-vmnet

Thanks, I'm a bit confused about the name. This doesn't seem to use Apple's vmnet.framework?

9.85 Mbps

More optimizations seem needed for Internet connectivity? Do you have guest-to-guest and/or host-to-guest benchmark result?

Code-Hex commented 1 year ago

Yes. They are different. I simply wanted to express what it means to use gvisor for VM networking. (Would gvisor-router be better?) I'm looking for good names.

I'm a bit confused about the name. This doesn't seem to use Apple's vmnet.framework?

This is simply because my internet is slow. I have confirmed that it performs almost as well as the host.

More optimizations seem needed for Internet connectivity?

I haven't tried it yet; the Guest to Guest part still seems to have a bug in the IP address assignment (I confirmed macOS guest only), so I will measure it after fixing it.

Do you have guest-to-guest and/or host-to-guest benchmark result?

Code-Hex commented 1 year ago

Sample code for network configuration is described.

func createNetworkDeviceConfiguration() (*vz.VirtioNetworkDeviceConfiguration, error) {
    f, err := os.Create("eth.pcap")
    if err != nil {
        return nil, err
    }

    network, err := vmnet.New("192.168.127.0/24",
        vmnet.WithPcapFile(f),
        vmnet.WithLogger(slog.Default()),
        vmnet.WithDNSConfig(&vmnet.DNSConfig{
            StaticRecords: map[string]netip.Addr{
                "codehex.internal.": netip.MustParseAddr("192.168.127.2"),
            },
        }),
    )
    if err != nil {
        return nil, err
    }
    gateway := network.Gateway()

    log.Println("gateway IP:", gateway.IPv4().String())
    log.Println("gateway HW Addr:", gateway.MACAddress().String())

    ma, _ := vz.NewRandomLocallyAdministeredMACAddress()
    dev, err := network.NewLinkDevice(ma.HardwareAddr(),
        vmnet.WithTCPIncomingForward(8888, 22),
    )
    if err != nil {
        return nil, err
    }
    if err != nil {
        return nil, fmt.Errorf("fail!!: %w", err)
    }

    log.Println("Device IP:", dev.IPv4().String())
    log.Println(network.Gateway().Leases()[:3])
    attachment, err := vz.NewFileHandleNetworkDeviceAttachment(dev.File())
    if err != nil {
        return nil, err
    }

    config, err := vz.NewVirtioNetworkDeviceConfiguration(attachment)
    if err != nil {
        return nil, err
    }

    config.SetMACAddress(ma)

    return config, nil
}
AkihiroSuda commented 1 year ago

Yes. They are different. I simply wanted to express what it means to use gvisor for VM networking. (Would gvisor-router be better?) I'm looking for good names.

What about vz-router, vz-usernet-router, or maybe just vz-usernet? Because it doesn't really use what-is-called gvisor (ptrace sandbox) but uses the usermode netstack library written for gvisor.

Code-Hex commented 1 year ago

Thanks.

usernet sounds good. I hope to support Qemu in the near future. (I still have no idea how to do this, so I'm looking for ideas)

I will report back when I have measured the performance between Guests.

AkihiroSuda commented 1 year ago

QEMU

The protocol is very simple, it is just {length uint32be, ethernetFrame []byte} over an AF_UNIX SOCK_STREAM socket.

Code-Hex commented 1 year ago

@AkihiroSuda I'm sorry too late.

I took benchmarks in some situations.

However, the Client -> Server (w/o -R option) case still does not seem to transfer data well. I haven't been able to figure out the cause until honestly.

In Guest -> Guest, I confirmed that a simple HTTP request can be made, but iperf3 shows that the transfer is not done. To solve this problem, I may need to clear the problem that sends data Client -> Server case 🙇

https://github.com/Code-Hex/gvisor-vmnet/issues/2

balajiv113 commented 1 year ago

I was able to make gvproxy work out of the box with both qemu and vz (Using same pipe that we used for socker_vmnet).

The below are the performance report MTU 1500, Guest (VZ) <-> Guest (QEMU) iperf -c 192.168.127.4 [ 4] 0.00-10.00 sec 770 MBytes 646 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 766 MBytes 643 Mbits/sec receiver

iperf -c 192.168.127.4 -R [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 1.16 GBytes 995 Mbits/sec 1 sender [ 4] 0.00-10.00 sec 1.15 GBytes 988 Mbits/sec receiver

Guest (QEMU) <-> Host iperf -c 192.168.127.254 [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 1.24 GBytes 1.07 Gbits/sec 0 sender [ 4] 0.00-10.00 sec 1.24 GBytes 1.07 Gbits/sec receiver

iperf -c 192.168.127.254 -R [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 1.18 GBytes 1.01 Gbits/sec 0 sender [ 4] 0.00-10.00 sec 1.18 GBytes 1.01 Gbits/sec receiver

Guest (VZ) <-> Host iperf -c 192.168.127.254 [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.11 GBytes 951 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 1.10 GBytes 949 Mbits/sec receiver

iperf -c 192.168.127.254 -R [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.13 GBytes 970 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 1.13 GBytes 968 Mbits/sec receiver

balajiv113 commented 1 year ago

In coming days, I will work on supporting native dgram support. But it required a good amount rewrite within gvisor-tap-vsock itself.

Additionally I was also thinking what if we plan to do the following as well,

AkihiroSuda commented 1 year ago

Replace libslirp entirely with gvisor-tap-vsock

Yes, but we may have to keep the libslirp mode too for a couple of releases for compatibility reason.

Replace port forwarding via ssh sock to gvisor-tap-vsock

No, as it doesn't work for localhost ports.

Replace our custom dns resolver with dns resolver present here (Some refactoring will be needed to support dynamic update of DNS via API)

Yes (cc @jandubois)

balajiv113 commented 1 year ago

No, as it doesn't work for localhost ports.

I believe with localhost ports you mean the pseudoloopback forwarders we use to bind to 127.0.0.1 on privileged ports.

AkihiroSuda commented 1 year ago

No, as it doesn't work for localhost ports.

I believe with localhost ports you mean the pseudoloopback forwarders we use to bind to 127.0.0.1 on privileged ports.

Sorry, I meant that the localhost ports in the guest cannot be forwarded with gvisor-tap-vsock

jandubois commented 1 year ago

Replace our custom dns resolver with dns resolver present here (Some refactoring will be needed to support dynamic update of DNS via API)

Yes (cc @jandubois)

I actually had a meeting with @Nino-K just this morning about switching to gvisor for Rancher Desktop on WSL2, and what would be required (like being able to add our own static names to the resolver).

I hope this all works out and we can use gvisor for all platforms.

Maybe @balajiv113 and @Nino-K can coordinate, as I think Nino is also considering patches to gvisor-tap-vsock to address the additional DNS requirements.

Nino-K commented 1 year ago

I can confirm that it also works out of the box with qemu. However, right now I'm trying to get it to work on Hyper-V/WSL. Something that might be worth noting is that the lack of support for SOCK_DGRAM is mainly due to the underlying library that only supports connection-oriented net.Conn here: https://github.com/mdlayher/vsock/blob/v1.2.0/vsock.go So that issue may need to be tackled there first 😞

No, as it doesn't work for localhost ports.

I was hoping the locahost ports are also supported through the API as well. Just glancing over here I didn't come across any limitation but maybe I need to dig into it deeper.

balajiv113 commented 1 year ago

I will work on supporting native dgram support

I have done thia via fd passthrough (vz creates socketpair and sends the server fd to gvisor process via unix socket) and results looks super good.

Results (MTU - 1500) VZ <-> Host [ 5] 0.00-10.00 sec 2.41 GBytes 2.07 Gbits/sec receiver

[ 5] 0.00-10.00 sec 1.81 GBytes 1.55 Gbits/sec 0 sender

VZ <-> VZ [ 5] 0.00-10.04 sec 2.33 GBytes 1.99 Gbits/sec 1 sender

[ 5] 0.00-10.05 sec 2.00 GBytes 1.71 Gbits/sec receiver

VZ <-> QEMU [ 5] 0.00-10.00 sec 1.38 GBytes 1.19 Gbits/sec 0 sender

[ 5] 0.00-10.00 sec 1.41 GBytes 1.22 Gbits/sec receiver

Changes https://github.com/balajiv113/lima/tree/full-network https://github.com/containers/gvisor-tap-vsock/pull/175

terminal 1 
limactl slirp 

terminal 2
limactl start default

terminal 3
limactl start vz

Note: This requires lot more clean-up but its testable

fd passthrough ??

balajiv113 commented 1 year ago

I was hoping the locahost ports are also supported through the API as well

@Nino-K - The problem here will be gvisor-tap-vsock knows only to route via endpoint ip's (Like 192.168.1.2).

For example, In VM,

iperf3 -s 127.0.0.1

iperf3 -c 127.0.0.1 //Working
iperf3 -c 192.168.1.2 //Not working

In our case, lima will receive events from guest-agent. But if we forward via gvisor-tap-vsock, we will forward it to 192.168.1.2 which doesn't have that port opened there. So it will fail

balajiv113 commented 1 year ago

@AkihiroSuda If this stats looks good, i can cleanup and draft a PR with this support.

Some questions would be,

AkihiroSuda commented 1 year ago

@AkihiroSuda If this stats looks good, i can cleanup and draft a PR with this support.

Thanks 👍

Some questions would be,

  • Should we maintain a separate binary for this network or adding under lima command itself is fine ?

I guess it can be just embedded in the limactl binary as in the host agent, but no strong opinion.

AkihiroSuda commented 1 year ago

https://github.com/balajiv113/lima/commit/561fb328f4a0f1c28623480f46968a5063b98e67

AkihiroSuda commented 1 year ago

@balajiv113 Do you plan to submit a PR? 🙏

balajiv113 commented 1 year ago

@AkihiroSuda Yes yes. All set on PR. But on testing there were some performance drop over time

Raised a issue, also been debugging this for some time https://github.com/containers/gvisor-tap-vsock/issues/182

balajiv113 commented 1 year ago

I will create atleast a draft PR by this weekend with support for optionally enabling this network mode. With this it should be good to be even merged as experimental if the code looks good.

We can parallely monitor the performance as well

afbjorklund commented 1 year ago

Seems confusing to have two network options both called "user"...

https://wiki.qemu.org/Documentation/Networking#User_Networking_(SLIRP)