zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.8k stars 6.58k forks source link

drivers: nxp_enet: Add configurability of ENET REFCLK #51107

Open jeolwang opened 2 years ago

jeolwang commented 2 years ago

System version zephyr3.1.0, hardware mimxrt1064_evk, run DHCP routine, PING test is unstable. In addition, after adding socket communication logic, socket communication was unstable and slow. After receiving and sending a few KB of data, the network seemed to be unavailable, and PING failed

The following is the device running log:

uart:~$ *** Booting Zephyr OS build 80711a54377e  ***

[00:00:04.602,050] <inf> net_config: Interface 1 (0x80002350) coming up
[00:00:04.611,267] <inf> eth_mcux: ETH_0 enabled 100M full-duplex mode.
[00:00:04.620,452] <inf> net_config: Running dhcpv4 client...
[00:00:04.686,737] <err> eth_mcux: ENET_GetRxFrameSize return: 4001

[00:00:07.634,460] <inf> net_dhcpv4: Received: 192.168.1.119
[00:00:07.642,730] <inf> net_config: IPv4 address: 192.168.1.119
[00:00:07.651,245] <inf> net_config: Lease time: 172800 seconds
[00:00:07.659,729] <inf> net_config: Subnet: 255.255.255.0
[00:00:07.667,816] <inf> net_config: Router: 192.168.1.1
uart:~$ main,206: This is a test...
main,211: Ethernet connected !
TCP echo server [0] waits for a connection on port 7...
[00:00:08.668,518] <err> eth_mcux: ENET_GetRxFrameSize return: 4001
[00:00:09.641,021] <err> eth_mcux: ENET_GetRxFrameSize return: 4001
[00:00:09.642,669] <err> eth_mcux: ENET_GetRxFrameSize return: 4001
[00:00:10.613,708] <err> eth_mcux: ENET_GetRxFrameSize return: 4001
[00:00:11.586,242] <err> eth_mcux: ENET_GetRxFrameSize return: 4001
[00:00:16.477,447] <err> eth_mcux: ENET_GetRxFrameSize return: 4001
[00:00:17.978,698] <err> eth_mcux: ENET_GetRxFrameSize return: 4001
danieldegrasse commented 1 year ago

@jeolwang Do you see this error on Zephyr 3.2? I ask because I believe that https://github.com/zephyrproject-rtos/zephyr/pull/48752 might have solved this issue

IbeVdV commented 1 year ago

I have the same issue on Zephyr 3.2 and Zephyr 3.3. Is there any progress? Can someone tell me what is or should be triggering the error?

jameswalmsley-cpi commented 1 year ago

@IbeVdV @dleach02 I am still experiencing this issue on Zephyr 3.4. Also my ethernet performance is really poor, no matter how I configure the driver.

I've sent a request to nxp for assistance. I'll post here if I found anything out.

Let me know if you have suggestions.

DerekSnell commented 1 year ago

@jameswalmsley-cpi , We have not seen this issue, and would like to replicate it. Are you able to provide the details needed to replicate? Preferably on an NXP evaluation board. What sample shows the issue? Are there any modifications needed to the sample to expose the issue? Thanks

jameswalmsley commented 1 year ago

@DerekSnell I have the mimxrt1064_evk board. I shall try to reproduce on this board. So far I can build the following (from zephyr/main):

west build -b mimxrt1064_evk zephyr/samples/net/zperf/    

zephyr-shell

zperf udp download

linux

iperf -V -u -c fe80::4:9fff:fe39:3ca4%enp0s20f0u1u1 
iperf -V -u -c fe80::4:9fff:fe39:3ca4%enp0s20f0u1u1
------------------------------------------------------------
Client connecting to fe80::4:9fff:fe39:3ca4%enp0s20f0u1u1, UDP port 5001
Sending 1450 byte datagrams, IPG target: 11062.62 us (kalman adjust)
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  1] local fe80::935:d1d6:7fda:bd83 port 49230 connected with fe80::4:9fff:fe39:3ca4 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0119 sec  1.25 MBytes  1.05 Mbits/sec
[  1] Sent 906 datagrams
read failed: Connection refused
read failed: Connection refused
read failed: Connection refused

The speed seems very slow. I have tried to change many settings like dtcm, hardware acceleration etc, and I always get a very similar result of 1.05 Mbits/sec. I get the same result on our board too.

I will try to reproduce the GetRxFrameSize issue now.

DerekSnell commented 1 year ago

Hi @jameswalmsley , Thanks for sharing this. We will test it out and see if we find similar results.

jameswalmsley commented 1 year ago

@DerekSnell I have created a PR to fix some issues with the driver. Perhaps you can help me get it into shape :) https://github.com/zephyrproject-rtos/zephyr/pull/60073

The ENET_GetRxFrameSize() errors came from the eth_mcux driver only supporting the REFCLK being generated by the RT1064 (as in the ref-board).

I have added some changes in #60073 to enable the REFCLK as input, and support configuration of both 25MHz / 50MHz crystals on the PHYs.

I've also added other changes to disable cache maintenance in the HAL driver when DTCM is used for all buffers.

Unfortunately I was not able to find the source of the performance issue.

danieldegrasse commented 1 year ago

Going to reopen this issue, as it is clearly still an issue on the platform.

DerekSnell commented 1 year ago

Hi @jameswalmsley , It sounds like you resolved the ENET_GetRxFrameSize() errors with the REFCLK changes in your PR https://github.com/zephyrproject-rtos/zephyr/pull/60073. Since the ENET_GetRxFrameSize() errors were the original problem reported in this Issue, and you are still seeing poor performance, I created a separate Issue https://github.com/zephyrproject-rtos/zephyr/issues/60144 to continue tracking the performance problems. In case your current PR closes this Issue. Thanks for all your contributions

DerekSnell commented 1 year ago

Hi @jameswalmsley , For some reason, GitHub will not let me @mention you on https://github.com/zephyrproject-rtos/zephyr/issues/60144, but it will let me mention you in this issue.

Based on this comment that resolves the performance issue, does using the latest main resolve your performance issue?

github-actions[bot] commented 11 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 9 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 7 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

decsny commented 6 months ago

please also try with the new driver

jwtaylor24 commented 5 months ago

I am also experiencing this issue.. I am using the new NXP "experimental" driver, with a few modifications.

We are running Zephyr 3.6.0 and have an NXP 1176 on a custom board. Also, instead of a PHY we are using the following 5-port managed Ethernet switch from Microchip (KSZ8775CLX, https://ww1.microchip.com/downloads/en/DeviceDoc/00002129C.pdf).

In our architecture, we have the 1176 MAC connected to the SW5 RMII MAC on the switch. So, I removed the PHY initialization piece of the driver (since it is mac-to-mac communication). The Ethernet switch is using an external 25 MHz crystal.

Still getting the following error, and a LOT of dropped packets (in the realm of 20-30% dropped over UDP). [00:00:04.148,000] <err> eth_nxp_enet_mac: eth_nxp_enet_rx: ENET_GetRxFrameSize return: 4001

jwtaylor24 commented 5 months ago

One comment / clarification. I did try running my software on the NXP MIMXRT_1170_EVK development board (obviously I had to put the PHY initialization back in the code), and I have absolutely no issues.. I don't drop packets, and I don't get any error messages on the console.

Is there something I am missing in a "MAC-to-MAC" configuration, or do you think I might have a hardware issue?

jwtaylor24 commented 4 months ago

For anyone reading this, we resolved our issue.

Turns out, the 1176 was driving the 50 MHz ENET REF clock for RMII mode (and, the Microchip switch was also driving the ENET REF clock.. so, we had a contention issue). Once the 1176 IOMUX GPR4 register was configured to accept the REF clock as an input, errors went away and Ethernet is completely functional.

decsny commented 4 months ago

since multiple people who had these errors found that changing the refclk configuration fixes it, I'm going to convert this to an enhancement request to add configurability of the refclk. @jeolwang if you find you have a different issue let us know