Closed stapelberg closed 4 years ago
If you still have that Buster Lite image around, can you run sudo rpi-update
on it and retest? If that works, try BRANCH=next sudo rpi-update
to pull in the latest downstream 5.4 kernel.
Since you're up to building your own kernel, you can try reverting to the most recent upstream 5.5 kernel (tag v5.5.15
) in an attempt to narrow down when the problem started.
I'll be looking at a checksum offload issue later, so I should be able to confirm that the GENET is vaguely functional on 5.6, but https://github.com/raspberrypi/linux/issues/3523 suggests that it is.
Thanks for the tip, I’ll try that later today!
For reference I'm not seeing issues with 5.6.2 on Fedora (aarch64 only, ARMv7 has other issues I need to investigate) on the upstream genet driver.
@stapelberg Just guessing: could you please try the other RGMII PHY modes by changing your devicetree? I assume you are currently running "rgmii-rxid" with the upstream DTS.
@stapelberg Just guessing: could you please try the other RGMII PHY modes by changing your devicetree? I assume you are currently running "rgmii-rxid" with the upstream DTS.
Yeah, you’re right:
% grep phy-mode /tmp/*.dts
/tmp/gokrazy.dts: phy-mode = "rgmii-rxid";
/tmp/raspbian.dts: phy-mode = "rgmii";
After changing it to rgmii, the network seems to work!
Thank you so much!
@stapelberg I prefer to fix it the mainline kernel. Could you please confirm that you were using a mainline kernel + DTS?
@nullr0ute Does phy-mode = "rgmii" also works for you?
Could you please confirm that you were using a mainline kernel + DTS?
Hereby confirmed, yes.
@stapelberg I prefer to fix it the mainline kernel. Could you please confirm that you were using a mainline kernel + DTS?
@nullr0ute Does phy-mode = "rgmii" also works for you?
We're currently using what ever the upstream default is, looking upstream that's rgmii-rxid, I can test changing it to rgmii in the DT when I get a moment.
@stapelberg Could you please doublecheck that Linux 5.5 has the same behavior?
Just checked with Linux 5.5.13. The issue is the same when not overriding phy-mode, and is fixed the same way when setting phy-mode=rgmii.
Thanks. I will take care of the upstream patch.
Is it okay to add you as a bug reporter to the patch?
Yes. Thanks for taking care of the upstream fix!
Looks like the same issue here: https://github.com/raspberrypi/linux/issues/3417
So i tested the change on 3 RPi 4 B against next-20200411 (multi_v7_defconfig) and it fails in most of the cases.
MAC Address | PHY mode | Result |
---|---|---|
DC:A6:32:23:54:85 | RGMII | FAIL |
DC:A6:32:23:54:85 | RGMII-RXID | OKAY |
B8:27:EB:FB:D8:28 | RGMII | FAIL |
B8:27:EB:FB:D8:28 | RGMII-RXID | OKAY |
DC:A6:32:3E:F2:35 | RGMII | OKAY |
DC:A6:32:3E:F2:35 | RGMII-RXID | OKAY |
Based on this result i cannot send the suggested change as a patch.
@stapelberg Could you please try current linux-next? Was RGMII the only PHY mode (there are 4) which worked for you?
Was RGMII the only PHY mode (there are 4) which worked for you?
Can you clarify which 4 values are interesting here? Are the values rgmii, rgmii-rxid, rgmii-txid, rgmii-id, or did I read this wrong?
Okay, here are my test results:
Linux 5.6.3:
MAC | PHY mode | dmesg | result |
---|---|---|---|
dc:a6:32:02:xx:yy | rgmii | external RGMII (no delay) | OKAY |
dc:a6:32:02:xx:yy | rgmii-rxid | external RGMII (RX delay) | FAIL |
dc:a6:32:02:xx:yy | rgmii-txid | external RGMII (TX delay) | FAIL |
dc:a6:32:03:yy:zz | rgmii | external RGMII (no delay) | OKAY |
dc:a6:32:03:yy:zz | rgmii-rxid | external RGMII (RX delay) | FAIL |
dc:a6:32:03:yy:zz | rgmii-txid | external RGMII (TX delay) | FAIL |
dc:a6:32:02:zz:aa | rgmii | external RGMII (no delay) | OKAY |
dc:a6:32:02:zz:aa | rgmii-rxid | external RGMII (RX delay) | FAIL |
dc:a6:32:02:zz:aa | rgmii-txid | external RGMII (TX delay) | FAIL |
linux-next-20200413:
MAC | PHY mode | dmesg | result |
---|---|---|---|
dc:a6:32:02:xx:yy | rgmii | external RGMII (no delay) | OKAY |
dc:a6:32:02:xx:yy | rgmii-rxid | external RGMII (RX delay) | FAIL |
dc:a6:32:02:xx:yy | rgmii-txid | external RGMII (TX delay) | FAIL |
dc:a6:32:03:yy:zz | rgmii | external RGMII (no delay) | OKAY |
dc:a6:32:03:yy:zz | rgmii-rxid | external RGMII (RX delay) | FAIL |
dc:a6:32:03:yy:zz | rgmii-txid | external RGMII (TX delay) | FAIL |
dc:a6:32:02:zz:aa | rgmii | external RGMII (no delay) | OKAY |
dc:a6:32:02:zz:aa | rgmii-rxid | external RGMII (RX delay) | FAIL |
dc:a6:32:02:zz:aa | rgmii-txid | external RGMII (TX delay) | FAIL |
In summary: on my three different Raspberry Pi 4 devices (one with 4G, the others with 2G of memory), only phy-mode rgmii works, both with Linux 5.6.3 and with today’s next-20200413.
Was RGMII the only PHY mode (there are 4) which worked for you?
Can you clarify which 4 values are interesting here? Are the values rgmii, rgmii-rxid, rgmii-txid, rgmii-id, or did I read this wrong?
Correct
@stapelberg Which kernel configuration did you use for your tests?
Correct
Okay. Do you want me to test rgmii-id as well, or are the results above regarding rgmii{,-rxid,-txid} enough to work with?
@stapelberg Which kernel configuration did you use for your tests?
make defconfig + https://github.com/gokrazy/kernel/blob/c3e1e48e481e208f95a9304166e9e75956552587/cmd/gokr-build-kernel/build.go#L17
I also attached the resulting /proc/config.gz
for your convenience: config.gz
Could you please retest Linux 5.6 but 32 bit and only multi_v7_defconfig (without any modifications)? Sorry, currently i don't have the time to setup a working 64 bit environment.
Sorry, testing 32-bit is too much effort for me. gokrazy was only ever targeting 64-bit.
I can test multi_v7_defconfig, but since I don’t use loadable modules, I’d need to do some modifications.
Okay, i will try to test with builtin on 32 bit.
Is gokrazy ready for RPi 4 yet?
Is gokrazy ready for RPi 4 yet?
It works as far as I can tell, but I haven’t installed a Raspberry Pi 4 into the continuous integration setup yet. https://github.com/gokrazy/gokrazy/issues/48 tracks these 2 remaining issues.
Okay, i will try to test with builtin on 32 bit.
I tested it, but didn't make any difference.
@stapelberg What is the minimum version of Go i need to install for gokrazy?
Not entirely sure. The current stable version (Go 1.14) definitely works and I’d recommend using it. We don’t usually test with older versions. The most likely failure scenario is that our code uses methods not yet available in your version of Go, which would result in a compile-time error. In other words: try it and see, if you’re adventurous :)
It’s quick & easy to install into your home dir (see https://golang.org/doc/install), in case your OS doesn’t provide Go 1.14.
Okay, i managed to get it working on my RPi 4. At least i can confirm that one of the Pis which required rgmii-rxid with multi_v7_defconfig / Raspbian works fine with rgmii under gokrazy:
[ 3.289489] bcmgenet fd580000.ethernet: configuring instance for external RGMII (no delay)
So we can definitely exclude a hardware issue.
@stapelberg I would be really fine to have access via debug UART / busybox.
@stapelberg I would be really fine to have access via debug UART / busybox.
I filed https://github.com/gokrazy/gokrazy/issues/54 just recently. For now, you can place https://t.zekjur.net/sh (statically compiled busybox) onto the permanent partition (4th partition), either from your computer with an SD card reader, or interactively via breakglass: https://github.com/gokrazy/breakglass
@stapelberg Sorry, i don't have the time for testing. But i think i've found the real issue. The MII PHY is not enabled in your config.
Please try to enable CONFIG_BROADCOM_PHY. Big thanks to Marek Szyprowski for finding this issue.
Aha, thank you! Let me verify this real quick.
Yep:
breakglass # gunzip -c /proc/config.gz | grep BROADCOM_PHY
# CONFIG_BROADCOM_PHY is not set
You’re right! Thanks very much. I pushed https://github.com/gokrazy/kernel/commit/82e30a7d5160d27b7725f28d7eada4894fc2a4e5 and verified it fixes it on my devices. Note that once I enabled CONFIG_BROADCOM_PHY, I had to also drop the phy-mode patch and go back to the default rgmii-rxid
, otherwise the network would not be stable.
Should there be a dependency in the kernel build system which enforces this setup, if this is the desired state?
The problem is that the Ethernet PHY is board specific, so we cannot really enforce a dependency. But there is a ongoing discussion.
[Filing this as a separate issue to avoid derailing issue #43. Please forgive me if this is not the right place, but I figured y’all would know the most about ethernet on the Raspberry Pi 4 with the upstream linux kernel.]
I’m running upstream Linux 5.6.2 on a Raspberry Pi 4 Model B (in 64 bit mode) and I’m having some trouble getting ethernet to work.
The link comes up, and packets are received: I can see the packets in tcpdump, and the link gets an IPv6 address based on router advertisements.
However, any packets that are sent (I can see them in tcpdump) are not seen on the network by other devices.
Here’s what I have tried/checked so far:
Any ideas for what might be wrong here, or what I could try to further diagnose this issue?
Thank you very much in advance!
cc @lategoodbye @pelwell