Open jkrautmacher opened 3 months ago
Updated bug report after it turned out that Zephyr v3.6.0
is affected too.
The bug is reproduced on nucleo_f767zi but not on stm32f769i_disco, so i tried to compare the two :
mac: ethernet@40028000 {
compatible = "st,stm32-ethernet";
reg = <0x40028000 0x8000>;
interrupts = <61 0>;
clock-names = "stmmaceth", "mac-clk-tx",
"mac-clk-rx", "mac-clk-ptp";
clocks = <&rcc STM32_CLOCK_BUS_AHB1 0x02000000>,
<&rcc STM32_CLOCK_BUS_AHB1 0x04000000>,
<&rcc STM32_CLOCK_BUS_AHB1 0x08000000>,
<&rcc STM32_CLOCK_BUS_AHB1 0x10000000>;
status = "disabled";
};
The difference :
STM32F769I-DISCO board includes additional components for PoE, such as the PM8800A PoE controller, transformers, and various passive components.
NUCLEO-F767ZI board lacks these components, which might affect the stability and performance of the Ethernet connection.
Impact on Ethernet:
@jkrautmacher can you please confirm that ?
I would be glad to help but unsure what to confirm. As far as I understood the current theory is that the power supply of the PHY on nucleo_f767zi
might not be stable enough during init so that initialization fails.
If this is correct my next debugging step would be to connect an oscilloscope to the supply voltage of the PHY and observe it during init. That together with a debug GPIO from the kernel code which toggles right before and after Ethernet init should validate this theory or not. The next step would be to either fix the hardware design or to add a workaround to the Zephyr kernel (longer delays or similar).
I am lacking two things to verify the theory:
nucleo_f767zi
Is there maybe more public information about the board than in UM1974 Rev 10 I overlooked? The oscilloscope situation I could maybe improve but it would likely be way faster if you could do that at ST if this is an option.
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
Since it was easy to test for me I checked how the board / firmware behaves with an external power supply instead of powering it via USB. I moved the jumper on JP3 from U5V to VIN-5V and provided 12 VDC to VIN and GND on CN8.
Result is: Success rate is 39 % (39 ok and 61 failed)
So this did not fix it. Updated the initial bug report accordingly.
Describe the bug
While working on a quite minimal Zephyr firmware for the
nucleo_f767zi
board I noticed that the network connection is unreliable. As shown below this is reproducible with thenet/telnet
sample.The bug can be noticed by the following indicators:
stm_eth
thread has significantly lower stack usage (see shell output ofkernel stacks
)The bug appears after roughly 40 % of the boot processes. It was so far not observed that network communication breaks later if it was operational directly after booting the board.
To Reproduce
Steps to reproduce the behavior:
nucleo_f767zi
board with an Ethernet cable to a Linux PC192.0.2.2/24
on the PCst-info
tool (see e.g. source repo)samples/net/telnet
as described in the Getting Started GuideExpected behavior
It is expected that the used script reports 100 % success rate.
Impact
This bug is a showstopper. A firmware with such an unreliable network connection is useless.
Logs and console output
The script summarizes a test with 100 iterations on my setup with:
Success rate is 39 % (39 ok and 61 failed)
forzephyr v3.6.0
Success rate is 44 % (44 ok and 56 failed)
forzephyr v3.7.0
Success rate is 49 % (49 ok and 51 failed)
forzephyr v3.7.0
with external power supply (12 V via VIN)Environment (please complete the following information):
0.16.5
v3.7.0
andv3.6.0