xaptum / xaprc

Drivers and firmware for the Xaptum ENF router cards
0 stars 0 forks source link

CDC Ethernet driver fails to assign correct mac address #13

Closed drbild closed 5 years ago

drbild commented 5 years ago

The g_ether gadget driver takes a host_addr parameter to configure the MAC address of the interface on the host.

However, this doesn't work on all our test hardware. On some gateways the host seems to select a different (perhaps random?) MAC address.

That breaks everything, since xbridge addresses frame to the MAC address that it thinks should have been assigned.

Working Systems

Broken Systems

peterhaijen commented 5 years ago

The MAC address used by g_ether is determined in xbridge-setup:

MAC=`cat /sys/class/net/wlan0/address`
modprobe g_ether host_addr=$MAC

If the wlan0 interface does not exist, then $MAC will be empty, and a random MAC address will be assigned to the host interface of the CDC driver.

Is it possible that on a broken system, there is no wlan0 interface?

drbild commented 5 years ago

I don't think it's a missing wlan0 interface because

1) This is with the same router card. When attached to the VM, the host interface gets the correct MAC. When attached to the physical gateway, the host interface does not.

2) The router card functions on the "broken" machine, after I explicitly force the host interface MAC address to the correct value.

peterhaijen commented 5 years ago

The MAC address for the wlan0 interface is determined twice, independently:

Both should produce the same result, and both are logged:

# ifconfig wlan0|grep HWaddr
wlan0     Link encap:Ethernet  HWaddr 68:CA:00:01:4B:6D  
# journalctl |grep MAC
Jan 29 18:31:16 buildroot kernel: usb 1-2: RTL8192EU MAC: 68:ca:00:01:4b:6d
Jan 29 18:40:04 buildroot kernel: usb0: HOST MAC 68:ca:00:01:4b:6d
Jan 29 18:40:04 buildroot kernel: usb0: MAC 9a:05:49:1c:df:8a
Jan 29 18:40:11 buildroot xbridge[1366]: Interface usb0: using MAC 9a:05:49:1c:df:8a
Jan 29 18:40:11 buildroot xbridge[1366]: Interface enf0: using MAC 68:ca:00:01:4b:6d

This has been 100% reliable on my setup; and I also tried this today on another Ubuntu laptop I have here, works fine. Since you're using the same router card in both scenarios, I'll assume that at least the router card is always producing similar output to what I've described above.

I've been Googling for this, and so far could only come up with this, which first describes how to set the g_ether host_addr by using a file /etc/modprobe.d/g_ether.conf, but then goes on to explain that suddenly this stopped working, and then proposes an alternate solution to set the MAC, by adding the kernel command line option g_ether.host_addr=68:ca:00:01:4b:6d.

Can you try setting the correct MAC by adding this as a kernel cmdline option?

peterhaijen commented 5 years ago

I've appended g_ether.host_addr=68:ca:00:01:4b:6d to my uboot bootargs variable, and after booting, when I run modprobe g_ether without specifying a MAC, g_ether is using the MAC specified at the command line, and my host has the expected MAC as well.

drbild commented 5 years ago

Found the root cause. The physical host is using the cdc_subset driver instead of cdc_ether. cdc_subset always assigns a random MAC address. It doesn't read one from the device.

This appears to be a race condition at boot between the loading of the drivers and enumeration of USB device. See the following screenshot.

On boot, the cdc_subset attaches to the device before the cdc_ether driver is registered. The correct cdc_ether is used after a forced reenumeration of the usb device (unbind/bind).

img_20190131_121143

I don't have a solution yet, but maybe you'll find something.

drbild commented 5 years ago

This can happen because the router card is using the RNDIS_VENDOR_NUM and RNDIS_PRODUCT_NUM from usb/gadget/legacy/ether.c. The cdc_subset driver explicitly registers for that VID and PID.

Disabling RNDIS or using a different VID/PID should prevent the cdc_subnet driver from binding.

We want to use our own VID and PID anyway, so that should take care of it.

peterhaijen commented 5 years ago

In the mean time, you could blacklist cdc_subset on the troublesome machine?

drbild commented 5 years ago

Fixed.