utmapp / UTM

Virtual machines for iOS and macOS
https://getutm.app
Apache License 2.0
27.14k stars 1.34k forks source link

Ubuntu Server 20.04 VM networking stops working intermittently #3372

Open tallytarik opened 2 years ago

tallytarik commented 2 years ago

Describe the issue I'm running a Ubuntu Server 20.04 VM according to the setup guide.

Every so often, the VM networking stops working while it is running. The VM can no longer access the internet, and I can no longer SSH from the host to the VM. The VM console window still works, and I'm able to log in and use the VM that way. I can shut down and restart the VM, and networking works again.

I've noticed that when networking stops working, the CPU usage for QEMULauncher sits at 100%. Nothing inside the VM (checked with htop) is using this much CPU.

It happens randomly - I can't reproduce it on demand. I've been running this VM daily for a couple of weeks, and I've seen this issue ~5 times. Once it happened twice (after a restart) within about 5 minutes.

Configuration

Crash log N/A

Debug log Will add ASAP - sorry, I enabled debug logging earlier, but have since restarted the VM. I'll wait for the issue to happen again and attach the debug log.

Upload VM config.plist.txt

tallytarik commented 2 years ago

The CPU usage might be unrelated. I've just had the CPU issue again - where QEMULauncher is at a minimum of 100% - but the VM networking is still working fine.

prabhah commented 2 years ago

same problem. port forwarding stop working, cannot SSH to the VM from the host

Configuration

UTM Version: 2.4.1 OS Version: macOS Monterey 12.0.1 Intel or Apple Silicon? Apple (M1 Pro) Shared networking

tallytarik commented 2 years ago

I've had it happen again just now.

Turns out the debug log is not particularly exciting

Running:  -L /Applications/UTM.app/Contents/Resources/qemu -S -qmp tcp:127.0.0.1:4000,server,nowait -nodefaults -vga none -spice "unix=on,addr=/Users/tallytarik/Library/Group Containers/WDNLXAD4W8.com.utmapp.UTM/257404A5-9A02-474C-AD00-CF75ADFF1F1E.spice,disable-ticketing=on,image-compression=off,playback-compression=off,streaming-video=off,gl=on" -device virtio-ramfb-gl -cpu cortex-a72 -smp cpus=8,sockets=1,cores=8,threads=1 -machine virt,highmem=off -accel hvf -accel tcg,tb-size=1500 -drive if=pflash,format=raw,unit=0,file=/Applications/UTM.app/Contents/Resources/qemu/edk2-aarch64-code.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=/Users/tallytarik/Library/Containers/com.utmapp.UTM/Data/Documents/DockerUbuntu.utm/Images/efi_vars.fd -boot menu=on -m 6000 -device intel-hda -device hda-duplex -name DockerUbuntu -device qemu-xhci,id=usb-bus -device usb-tablet,bus=usb-bus.0 -device usb-mouse,bus=usb-bus.0 -device usb-kbd,bus=usb-bus.0 -device ich9-usb-ehci1,id=usb-controller-0 -device ich9-usb-uhci1,masterbus=usb-controller-0.0,firstport=0,multifunction=on -device ich9-usb-uhci2,masterbus=usb-controller-0.0,firstport=2,multifunction=on -device ich9-usb-uhci3,masterbus=usb-controller-0.0,firstport=4,multifunction=on -chardev spicevmc,name=usbredir,id=usbredirchardev0 -device usb-redir,chardev=usbredirchardev0,id=usbredirdev0,bus=usb-controller-0.0 -chardev spicevmc,name=usbredir,id=usbredirchardev1 -device usb-redir,chardev=usbredirchardev1,id=usbredirdev1,bus=usb-controller-0.0 -chardev spicevmc,name=usbredir,id=usbredirchardev2 -device usb-redir,chardev=usbredirchardev2,id=usbredirdev2,bus=usb-controller-0.0 -device virtio-blk-pci,drive=drive0,bootindex=0 -drive if=none,media=disk,id=drive0,file=/Users/tallytarik/Library/Containers/com.utmapp.UTM/Data/Documents/DockerUbuntu.utm/Images/disk-0.qcow2,cache=writethrough -device usb-storage,drive=drive1,removable=true,bootindex=1,bus=usb-bus.0 -drive if=none,media=cdrom,id=drive1 -device virtio-net-pci,mac=E6:84:EB:2B:78:64,netdev=net0 -netdev vmnet-macos,mode=shared,id=net0 -device virtio-serial -device virtserialport,chardev=vdagent,name=com.redhat.spice.0 -chardev spicevmc,id=vdagent,debug=0,name=vdagent -uuid 257404A5-9A02-474C-AD00-CF75ADFF1F1E -rtc base=localtime
qemu-aarch64-softmmu: -netdev vmnet-macos,mode=shared,id=net0: info: Started vmnet interface with configuration:
qemu-aarch64-softmmu: -netdev vmnet-macos,mode=shared,id=net0: info: MTU:              1500
qemu-aarch64-softmmu: -netdev vmnet-macos,mode=shared,id=net0: info: Max packet size:  1514
qemu-aarch64-softmmu: -netdev vmnet-macos,mode=shared,id=net0: info: MAC:              c2:af:bc:c3:cf:9e
qemu-aarch64-softmmu: -netdev vmnet-macos,mode=shared,id=net0: info: DHCP IPv4 start:  192.168.64.1
qemu-aarch64-softmmu: -netdev vmnet-macos,mode=shared,id=net0: info: DHCP IPv4 end:    192.168.64.254
qemu-aarch64-softmmu: -netdev vmnet-macos,mode=shared,id=net0: info: IPv4 subnet mask: 255.255.255.0
qemu-aarch64-softmmu: -netdev vmnet-macos,mode=shared,id=net0: info: UUID:             216D6223-C430-48C2-B471-9E298EB4A802
qemu-aarch64-softmmu: warning: Spice: playback:0 (0x14486e920): setsockopt failed, Operation not supported on socket
qemu-aarch64-softmmu: warning: Spice: record:0 (0x14486e9d0): setsockopt failed, Operation not supported on socket
gl_version 30 - es profile enabled
WARNING: running without ARB/KHR robustness in place may crash
tallytarik commented 2 years ago

I was able to restore networking by running this in the console window:

ip link set dev enp0s9 down
ip link set dev enp0s9 up
polynomialspace commented 2 years ago

Just upgraded to Monterey 12.1 and ran into the same issue. Headless (console) Linux VM Guest.

By the look of things it's initially getting APIPA / ULA addresses:

2: enp0s6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP g0
    link/ether 9a:66:8c:d6:98:4e brd ff:ff:ff:ff:ff:ff
    inet 169.254.234.93/16 brd 169.254.255.255 scope global noprefixroute enp0s6
       valid_lft forever preferred_lft forever
    inet6 fd14:97e6:d2a3:5250:16b1:dd1:b42e:cd54/64 scope global temporary dyna 
       valid_lft 604780sec preferred_lft 86163sec
    inet6 fd14:97e6:d2a3:5250:9866:8cff:fed6:984e/64 scope global dynamic mngtm 
       valid_lft 2591980sec preferred_lft 604780sec
    inet6 fe80::9866:8cff:fed6:984e/64 scope link 
       valid_lft forever preferred_lft forever

DHCP bug maybe? Seems to be dependent on firewall. Previously on Big Sur I was running "Drop all incoming connections", later went down to turning on stealth mode, neither seemed to interrupt UTM. Now it appears that having the firewall enabled at all (even setting UTM.app and QEMULauncher.app to "Allow incoming connections" doesn't help) seems to break DHCP.

Setting the IP manually (to what DHCP would normally provide) seems to work, although that might be coincidental;

ip link set enp0s6 down
systemctl stop dhcpcd
ip link set enp0s6 up
ip addr add 192.168.64.4/24 dev enp0s6 
ip ro add default via 192.168.64.1

Relatively tame firewall settings that feel like they shouldn't be causing issue, perhaps something up with the vmnet-mac qemu driver? image

tallytarik commented 2 years ago

@polynomialspace Thanks for doing some extra digging!

After I read your comment I tried disabling the firewall, but I just saw the VM networking die again - with firewall off. So maybe it's not a factor?

thisisthekap commented 2 years ago

Got the issue with a ubuntu 20.04.3 guest (x64) on macos 12.1 (21C52) (Apple Silicon).

Pratyush commented 2 years ago

I've been encountering the same issue, and the fix by @tallytarik works to fix it at least temporarily.

ip link set dev enp0s9 down
ip link set dev enp0s9 up
ml-costmo commented 2 years ago

Here to add a "me too" on Apple M1 host and arm64 Ubuntu guest.

Networking simply stops working, suddenly and without any repeatable causal pattern, as far as I can discern.

I cannot SSH into the guest, nor can I reach it from any sources that are external to the VM. The window manager still works, and I can either restart the interface as outlined above or reboot the VM to workaround until it strikes again.

I do not have the symptom of "uses 100% CPU" during those times. Other than network connections failing, the system seems to be running as expected.

sokurenko commented 2 years ago

For me it occurred when I had call on the host OS using Microsoft teams, don’t know if coincidence, having a workaround will help, thanks!

agaffney commented 2 years ago

I've been seeing this too. I originally thought that it was happening around network changes, because I would find the VM networking dead after the wifi disconnects (due to the issue with 80MHz channel width). It has been better since I addressed that a few days ago, but it's happened twice today without an associated loss of wifi connectivity. I checked for it around other network transitions (connecting and disconnecting a VPN), but it seemed to still be working. I then found it dead again about 30 minutes later.

agaffney commented 2 years ago

My Ubuntu VM network just dropped again in the middle of using it, but the ip link set dev <device> down/up command suggested above worked to bring it back.

pawelwiejkut commented 2 years ago

Hi all,

This issue is vey easy to reproduce on latest opensuse leap + sharing a big file (over 10 GB) via ssh. It is freezing every time. I can provide logs if necessary.

Wysłane z iPhone'a

Wiadomość napisana przez Andrew Gaffney @.***> w dniu 23.02.2022, o godz. 17:13:

 My Ubuntu VM network just dropped again in the middle of using it, but the ip link set dev down/up command suggested above worked to bring it back.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you are subscribed to this thread.

agaffney commented 2 years ago

Something similar seems to happen when using Lima, which also uses QEMU.

mcfriend99 commented 2 years ago

ip link set dev enp0s9 down

I keep getting zsh: command not found: ip error. Any help here??

agaffney commented 2 years ago

That sounds like you don't have the iproute2 package installed. It should be a pretty standard part of most Linux distributions these days. There should also be an equivalent command in the older ifconfig utility.

mcfriend99 commented 2 years ago

Oh... I was running it on the host machine. The commands ran successfully on the Ubuntu VM, but the network problem persists. Any help here??

armen-y commented 2 years ago

Same issue for me as well. I have noticed it with the pre-built ununtu 20.04 image from the gallery. All networking services become randomly unavailable.

ip link set dev enp0s9 down ip link set dev enp0s9 up

This solution is the only one I have found so far.

Quentin-Guyot commented 2 years ago

Same issue here, sometimes restart the VM helps to reconnect to it and sometimes not, need to restart completely the mac. I will try the command to down up the network link to see if it helps.

adeadman commented 1 year ago

I'm also getting this, Ubuntu 22.10 guest, macOS 13.3.1 host on Intel, running on the Apple Virtualisation backend (not qemu). Toggling the network interface on/off from the gnome shell top right menu also works to restore connectivity, at which point VPN etc. must be reconnected.

Similar to a previous commenter, I am using Microsoft Teams on the host mac and it might be correlated. I also have a USB-C ethernet interface and subjectively seem to experience the networking failures more when both that and wifi are plugged in and enabled, although I get it when just using wifi as well.

thedarb commented 1 year ago

I had this same issue and it was driving me nuts. I finally decided to try changing the Emulated Network Card from 'virtio-net-pci' to 'virtio-net-device'. It's been a week now without any loss of network.

Now if I can just get GL display drivers to not randomly lock up, this work mac can be a Linux desktop all the time. Sooo close I can taste it.

sokurenko commented 1 year ago

This works for me and needs to be done after every reboot: dhclient enp0s1

dabaer commented 1 year ago

I started getting this issue within the last two months (which is weird as this is a very old issue). On both Ubuntu and RHEL the network completely dies after (usually hours or days) but now more frequently. Sometimes minutes or an hour.

I can solve this by setting the Network device to net-virtio-device, but you can only do this on one VM at a time, so only one of my VMs avoids this issue.

dtinth commented 1 year ago

I ended up creating this script that periodically checks if the internet is up, and if not, automatically restarts the interface, based on the solution in this issue.

#!/bin/bash

if [ $(whoami) != root ]
then
        echo This command must be run as root >&2
        exit 1
fi

while true
do
        now=$(date +'%Y-%m-%dT%H:%M:%S')
        if curl --silent --show-error --max-time 5 https://cloudflare-ipfs.com/ipfs/bafkreihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku
        then
                echo "$now - Internet is OK"
        else
                echo "$now - Internet is not OK"
                ip link set enp0s1 down
                ip link set enp0s1 up
        fi
        sleep 15
done

Notes:

antun commented 10 months ago

This is happening for me too, although the solution to restart networking with sudo ip link set enp0s1 down / sudo ip link set enp0s1 up does not resolve it. Only a full shut-down and restart seems to fix it.

Host environment: Mac 14.2.1 (23C71) (Intel) Guest environment: Ubuntu 20.04.2 with UI UTM version: Version 4.4.5 (94)

I've also tried disabling/enabling networking through the guest Ubuntu UI, but that doesn't work either.

chrisvanmeer commented 9 months ago

Same here, after running the latest updates on Ubuntu 22.04.03 LTS.
Intermittent loss of network. Updated the UTM app and now using the script @dtinth provided.

The strange thing is, this is a VM that has been running over 6 months, and just this week - after the updates, it started falling apart.

illixion commented 9 months ago

I've had a similar issue, except occurring around once a day, figured out that it's related to Netplan not applying the default route when updating the DHCP lease. More info on this can be found here, replacing Netplan with NetworkManager solved this for me.

chrisvanmeer commented 9 months ago

@illixion hmm mine was already set to NetworkManager.
The script to keep the connection alive works for me as a workaround.