lategoodbye / rpi-zero

Linux kernel source tree
Other
22 stars 3 forks source link

Raspberry Pi 4: bcmgenet ethernet does not send packets with Linux 5.6.2 #47

Closed stapelberg closed 4 years ago

stapelberg commented 4 years ago

[Filing this as a separate issue to avoid derailing issue #43. Please forgive me if this is not the right place, but I figured y’all would know the most about ethernet on the Raspberry Pi 4 with the upstream linux kernel.]

I’m running upstream Linux 5.6.2 on a Raspberry Pi 4 Model B (in 64 bit mode) and I’m having some trouble getting ethernet to work.

The link comes up, and packets are received: I can see the packets in tcpdump, and the link gets an IPv6 address based on router advertisements.

However, any packets that are sent (I can see them in tcpdump) are not seen on the network by other devices.

Here’s what I have tried/checked so far:

  1. I’m using the official Raspberry Pi USB-C power supply.
  2. I tried swapping ethernet cables.
  3. I tried connecting the Raspberry Pi to my laptop directly. Running tcpdump on my laptop, I cannot see any packets from the Raspberry Pi (the other direction works).
  4. I tried applying kernel patch https://github.com/raspberrypi/linux/commit/049c9831dba88ccf35d851277c95e33f4a77aa62, but that only resulted in “hw csum failure” kernel error messages: https://gist.github.com/stapelberg/4fb444d0b8460179ea99b336560afd44
  5. I tried applying the patch in point 4 and disabling tx/rx checksum offloading in the kernel. That eliminated the “hw csum failure” error messages, but the symptom of not being able to send packets remains the same.
  6. I have verified that 2020-02-13-raspbian-buster-lite.img works as expected with my setup: it can receive and send ethernet packets. I’m concluding it’s most likely a kernel driver issue, not a hardware issue.
  7. I tried just copying the *.[ch] files from raspberrypi/linux into linux 5.6.2, but the files have changed too much for me to get them to compile.
  8. I’m using the most recent firmware (https://github.com/raspberrypi/firmware/commit/c2c6ce8de2dcfd5a6852a32a16003f251)
  9. I’m using the most recent EEPROM (https://github.com/raspberrypi/rpi-eeprom/commit/a5be2ff8b15dd36cb3bb83a3f864514cd9cfcf3e)

Any ideas for what might be wrong here, or what I could try to further diagnose this issue?

Thank you very much in advance!

cc @lategoodbye @pelwell

pelwell commented 4 years ago

If you still have that Buster Lite image around, can you run sudo rpi-update on it and retest? If that works, try BRANCH=next sudo rpi-update to pull in the latest downstream 5.4 kernel.

Since you're up to building your own kernel, you can try reverting to the most recent upstream 5.5 kernel (tag v5.5.15) in an attempt to narrow down when the problem started.

I'll be looking at a checksum offload issue later, so I should be able to confirm that the GENET is vaguely functional on 5.6, but https://github.com/raspberrypi/linux/issues/3523 suggests that it is.

stapelberg commented 4 years ago

Thanks for the tip, I’ll try that later today!

nullr0ute commented 4 years ago

For reference I'm not seeing issues with 5.6.2 on Fedora (aarch64 only, ARMv7 has other issues I need to investigate) on the upstream genet driver.

lategoodbye commented 4 years ago

@stapelberg Just guessing: could you please try the other RGMII PHY modes by changing your devicetree? I assume you are currently running "rgmii-rxid" with the upstream DTS.

stapelberg commented 4 years ago

@stapelberg Just guessing: could you please try the other RGMII PHY modes by changing your devicetree? I assume you are currently running "rgmii-rxid" with the upstream DTS.

Yeah, you’re right:

% grep phy-mode /tmp/*.dts              
/tmp/gokrazy.dts:           phy-mode = "rgmii-rxid";
/tmp/raspbian.dts:          phy-mode = "rgmii";

After changing it to rgmii, the network seems to work!

Thank you so much!

lategoodbye commented 4 years ago

@stapelberg I prefer to fix it the mainline kernel. Could you please confirm that you were using a mainline kernel + DTS?

@nullr0ute Does phy-mode = "rgmii" also works for you?

stapelberg commented 4 years ago

Could you please confirm that you were using a mainline kernel + DTS?

Hereby confirmed, yes.

nullr0ute commented 4 years ago

@stapelberg I prefer to fix it the mainline kernel. Could you please confirm that you were using a mainline kernel + DTS?

@nullr0ute Does phy-mode = "rgmii" also works for you?

We're currently using what ever the upstream default is, looking upstream that's rgmii-rxid, I can test changing it to rgmii in the DT when I get a moment.

lategoodbye commented 4 years ago

@stapelberg Could you please doublecheck that Linux 5.5 has the same behavior?

stapelberg commented 4 years ago

Just checked with Linux 5.5.13. The issue is the same when not overriding phy-mode, and is fixed the same way when setting phy-mode=rgmii.

lategoodbye commented 4 years ago

Thanks. I will take care of the upstream patch.

Is it okay to add you as a bug reporter to the patch?

stapelberg commented 4 years ago

Yes. Thanks for taking care of the upstream fix!

lategoodbye commented 4 years ago

Looks like the same issue here: https://github.com/raspberrypi/linux/issues/3417

lategoodbye commented 4 years ago

So i tested the change on 3 RPi 4 B against next-20200411 (multi_v7_defconfig) and it fails in most of the cases.

MAC Address PHY mode Result
DC:A6:32:23:54:85 RGMII FAIL
DC:A6:32:23:54:85 RGMII-RXID OKAY
B8:27:EB:FB:D8:28 RGMII FAIL
B8:27:EB:FB:D8:28 RGMII-RXID OKAY
DC:A6:32:3E:F2:35 RGMII OKAY
DC:A6:32:3E:F2:35 RGMII-RXID OKAY

Based on this result i cannot send the suggested change as a patch.

@stapelberg Could you please try current linux-next? Was RGMII the only PHY mode (there are 4) which worked for you?

stapelberg commented 4 years ago

Was RGMII the only PHY mode (there are 4) which worked for you?

Can you clarify which 4 values are interesting here? Are the values rgmii, rgmii-rxid, rgmii-txid, rgmii-id, or did I read this wrong?

stapelberg commented 4 years ago

Okay, here are my test results:

Linux 5.6.3:

MAC PHY mode dmesg result
dc:a6:32:02:xx:yy rgmii external RGMII (no delay) OKAY
dc:a6:32:02:xx:yy rgmii-rxid external RGMII (RX delay) FAIL
dc:a6:32:02:xx:yy rgmii-txid external RGMII (TX delay) FAIL
dc:a6:32:03:yy:zz rgmii external RGMII (no delay) OKAY
dc:a6:32:03:yy:zz rgmii-rxid external RGMII (RX delay) FAIL
dc:a6:32:03:yy:zz rgmii-txid external RGMII (TX delay) FAIL
dc:a6:32:02:zz:aa rgmii external RGMII (no delay) OKAY
dc:a6:32:02:zz:aa rgmii-rxid external RGMII (RX delay) FAIL
dc:a6:32:02:zz:aa rgmii-txid external RGMII (TX delay) FAIL

linux-next-20200413:

MAC PHY mode dmesg result
dc:a6:32:02:xx:yy rgmii external RGMII (no delay) OKAY
dc:a6:32:02:xx:yy rgmii-rxid external RGMII (RX delay) FAIL
dc:a6:32:02:xx:yy rgmii-txid external RGMII (TX delay) FAIL
dc:a6:32:03:yy:zz rgmii external RGMII (no delay) OKAY
dc:a6:32:03:yy:zz rgmii-rxid external RGMII (RX delay) FAIL
dc:a6:32:03:yy:zz rgmii-txid external RGMII (TX delay) FAIL
dc:a6:32:02:zz:aa rgmii external RGMII (no delay) OKAY
dc:a6:32:02:zz:aa rgmii-rxid external RGMII (RX delay) FAIL
dc:a6:32:02:zz:aa rgmii-txid external RGMII (TX delay) FAIL

In summary: on my three different Raspberry Pi 4 devices (one with 4G, the others with 2G of memory), only phy-mode rgmii works, both with Linux 5.6.3 and with today’s next-20200413.

lategoodbye commented 4 years ago

Was RGMII the only PHY mode (there are 4) which worked for you?

Can you clarify which 4 values are interesting here? Are the values rgmii, rgmii-rxid, rgmii-txid, rgmii-id, or did I read this wrong?

Correct

lategoodbye commented 4 years ago

@stapelberg Which kernel configuration did you use for your tests?

stapelberg commented 4 years ago

Correct

Okay. Do you want me to test rgmii-id as well, or are the results above regarding rgmii{,-rxid,-txid} enough to work with?

@stapelberg Which kernel configuration did you use for your tests?

make defconfig + https://github.com/gokrazy/kernel/blob/c3e1e48e481e208f95a9304166e9e75956552587/cmd/gokr-build-kernel/build.go#L17

I also attached the resulting /proc/config.gz for your convenience: config.gz

lategoodbye commented 4 years ago

Could you please retest Linux 5.6 but 32 bit and only multi_v7_defconfig (without any modifications)? Sorry, currently i don't have the time to setup a working 64 bit environment.

stapelberg commented 4 years ago

Sorry, testing 32-bit is too much effort for me. gokrazy was only ever targeting 64-bit.

I can test multi_v7_defconfig, but since I don’t use loadable modules, I’d need to do some modifications.

lategoodbye commented 4 years ago

Okay, i will try to test with builtin on 32 bit.

Is gokrazy ready for RPi 4 yet?

stapelberg commented 4 years ago

Is gokrazy ready for RPi 4 yet?

It works as far as I can tell, but I haven’t installed a Raspberry Pi 4 into the continuous integration setup yet. https://github.com/gokrazy/gokrazy/issues/48 tracks these 2 remaining issues.

lategoodbye commented 4 years ago

Okay, i will try to test with builtin on 32 bit.

I tested it, but didn't make any difference.

lategoodbye commented 4 years ago

@stapelberg What is the minimum version of Go i need to install for gokrazy?

stapelberg commented 4 years ago

Not entirely sure. The current stable version (Go 1.14) definitely works and I’d recommend using it. We don’t usually test with older versions. The most likely failure scenario is that our code uses methods not yet available in your version of Go, which would result in a compile-time error. In other words: try it and see, if you’re adventurous :)

It’s quick & easy to install into your home dir (see https://golang.org/doc/install), in case your OS doesn’t provide Go 1.14.

lategoodbye commented 4 years ago

Okay, i managed to get it working on my RPi 4. At least i can confirm that one of the Pis which required rgmii-rxid with multi_v7_defconfig / Raspbian works fine with rgmii under gokrazy:

[ 3.289489] bcmgenet fd580000.ethernet: configuring instance for external RGMII (no delay)

So we can definitely exclude a hardware issue.

@stapelberg I would be really fine to have access via debug UART / busybox.

stapelberg commented 4 years ago

@stapelberg I would be really fine to have access via debug UART / busybox.

I filed https://github.com/gokrazy/gokrazy/issues/54 just recently. For now, you can place https://t.zekjur.net/sh (statically compiled busybox) onto the permanent partition (4th partition), either from your computer with an SD card reader, or interactively via breakglass: https://github.com/gokrazy/breakglass

lategoodbye commented 4 years ago

@stapelberg Sorry, i don't have the time for testing. But i think i've found the real issue. The MII PHY is not enabled in your config.

Please try to enable CONFIG_BROADCOM_PHY. Big thanks to Marek Szyprowski for finding this issue.

stapelberg commented 4 years ago

Aha, thank you! Let me verify this real quick.

Yep:

breakglass # gunzip -c /proc/config.gz | grep BROADCOM_PHY
# CONFIG_BROADCOM_PHY is not set
stapelberg commented 4 years ago

You’re right! Thanks very much. I pushed https://github.com/gokrazy/kernel/commit/82e30a7d5160d27b7725f28d7eada4894fc2a4e5 and verified it fixes it on my devices. Note that once I enabled CONFIG_BROADCOM_PHY, I had to also drop the phy-mode patch and go back to the default rgmii-rxid, otherwise the network would not be stable.

Should there be a dependency in the kernel build system which enforces this setup, if this is the desired state?

lategoodbye commented 4 years ago

The problem is that the Ethernet PHY is board specific, so we cannot really enforce a dependency. But there is a ongoing discussion.