helloSystem / ISO

helloSystem Live and installation ISO
https://github.com/helloSystem/
BSD 3-Clause "New" or "Revised" License
805 stars 58 forks source link

Macs with BCM57765 Ethernet Controller: System unresponsive with network cable plugged in #355

Open probonopd opened 2 years ago

probonopd commented 2 years ago

Discussed in https://github.com/helloSystem/hello/discussions/297

Originally posted by **patmaddox** December 23, 2021 I'm trying to get started with 0.7 on a 2013 iMac. Something seems to not like my NIC (which uses `bge` driver according to `ifconfig`). ## Can't interact with UI when booting with network cable plugged in (doesn't finish booting?) If I boot from USB with the network cable plugged in, it seems to not finish booting. I see the desktop, and an icon called `Starvolume`, but I don't see `Live` or `EFI` (which is what I see when I boot without a cable. The mouse and keyboard are non-responsive. ## Attaching network cable after booting makes UI non-interactive If I boot from USB without the cable plugged in, I get a responsive UI and can open applications. I've opened the Terminal and looked at `ifconfig`. After I plug the network cable in, I can still use the Terminal and see that the NIC has been configured via DHCP. Next I try to launch an application - `Create Live Media` so I can install helloSystem onto the internal hard drive. At this point I'm unable to interact with the UI at all. I can move the mouse cursor around, but not click anything, and the keyboard is unresponsive. I can no longer interact with Terminal either. --- I don't have a clue how to debug this since I can't interact with anything after plugging in the network cable. Any ideas?
probonopd commented 2 years ago

On Mac mini Mid 2010 (Macmini4,1) (which contains the NetXtreme BCM57765 Gigabit Ethernet PCIe) helloSystem 0.7.0 becomes unresponsive then the Ethernet cable is plugged in. If the Logs utility is open, the last message seen in /var/log/messages is bge0: watchdog timeout --resetting.

Even with dev.bge.0.msi=0 in /boot/loader.conf, the system becomes unresponsive as soon as the Ethernet cable is plugged in.

Logs:

image

References

probonopd commented 2 years ago

According to https://www.truenas.com/community/threads/error-bge0-watchdog-timeout-resetting.64345/, hw.bge.allow_asf=0 is needed. When entering this manually at the bootloader prompt (in addition to dev.bge.0.msi=0), then booting the helloSystem Live ISO, then plugging in the network cable, the system reboots after a few seconds! If booting with the network cable already plugged in, the desktop is unresponsive from the beginning.

probonopd commented 2 years ago

Possibly related: [zone: mbuf_cluster] kern.ipc.nmbufs limit reached appears immediately after "bge0: link state changed to UP`.

https://forum.netgate.com/topic/95822/kern-ipc-nmbufs-limit-reached/9 points to https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards which suggests additional loader.conf entries.

Using:

dev.bge.0.msi=0
hw.bge.allow_asf=0
kern.ipc.nmbclusters="1000000"
hw.bce.tso_enable=0
hw.pci.enable_msix=0

Results in the system not crashing but network to being not functional.

The interface gets an IP address using DHCP but DNS resolution does not work and pinging it from another device leads to massive packet loss:

FreeBSD% ping 192.168.0.178
PING 192.168.0.178 (192.168.0.178): 56 data bytes
64 bytes from 192.168.0.178: icmp_seq=0 ttl=64 time=0.638 ms
64 bytes from 192.168.0.178: icmp_seq=3 ttl=64 time=0.697 ms
64 bytes from 192.168.0.178: icmp_seq=6 ttl=64 time=0.661 ms
64 bytes from 192.168.0.178: icmp_seq=9 ttl=64 time=1.132 ms
64 bytes from 192.168.0.178: icmp_seq=10 ttl=64 time=0.744 ms
64 bytes from 192.168.0.178: icmp_seq=13 ttl=64 time=0.746 ms
^C
--- 192.168.0.178 ping statistics ---
16 packets transmitted, 6 packets received, 62.5% packet loss

dmesg is showing

bge0: link state changed to UP
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: link state changed to DOWN
...

Same result when only using

dev.bge.0.msi=0
kern.ipc.nmbclusters="1000000"
probonopd commented 2 years ago

https://lists.freebsd.org/pipermail/freebsd-net/2011-April/028728.html says

NVIDIA bridge controller is known to have MSI issues for a long time in FreeBSD

While I don't entierly understand what this all means, the Macmini4,1 does have a Nvidia MCP89 chipset.

probonopd commented 2 years ago

Possibly @landonf knows how to get the Ethernet part of the BCM57765 in the Macmini4,1 to work properly on FreeBSD?

probonopd commented 2 years ago

https://wiki.freebsd.org/IntelMacMini says

To get bge(4) working, add the following to /boot/loader.conf: hw.pci.enable_msi="0"

So it was working at some point?

probonopd commented 2 years ago

https://wiki.freebsd.org/AppleMacbook mentions under "Wired network seems to hang": hw.msk.msi_disable=1.

TODO: Test whether this helps.

probonopd commented 2 years ago

According to https://bsd-hardware.info/?id=pci:14e4-16b4-14e4-16b4 (at least) these devices use it:

probonopd commented 2 years ago

Submitted upstream: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260881

probonopd commented 2 years ago

A user with an Acer Aspire v3-571g (likely containing a BCM57785) is reporting a similar symptom here.

probonopd commented 2 years ago

Macmini4,1: