gptp does not work well on NXP rt series platform

zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.

https://docs.zephyrproject.org

Apache License 2.0

9.83k stars 6.06k forks source link

gptp does not work well on NXP rt series platform #33747

Closed kevin137 closed 8 months ago

kevin137 commented 3 years ago

Description of bug

Problem with the gptp_event_capture() function in the gPTP demo in ZephyrOS on the RT1020EVK: time.second member of net_ptp_time struct is updated correctly, but time.nanosecond is NOT. The same function in the same demo on the FRDM-K64F works correctly.

Background

We are trying to use gPTP (IEEE 802.1AS) on the i.MXRT platform to synchronize timestamps across devices in a sensor network. All the hardware documentation for the i.MXRT and App Notes such as AN12149 (https://www.nxp.com/docs/en/nxp/application-notes/AN12149.pdf) indicate that this should be possible. We are interested in using ZephyrOS because we feel it should work well with the way we develop embedded systems, which are at the moment all based on Linux and Yocto. Zephyr seems to have good network support, and gPTP support, at least on some platforms (https://docs.zephyrproject.org/latest/reference/networking/gptp.html). To kick-start this process, and to evaluate whether gPTP/ZephyrOS/i.MXRT is a good technical solution for synchronization, we asked a graduate student at Universitat Politècnica de Catalunya, Santi Prats, to help us.

Santi advanced significantly, first on installing the SDK and tools, then with compiling Zephyr and applications for i.MXRT and Kinetis, also with PTP and gPTP synchronization purely in Linux. Then he encountered a problem which stumped him and us.

To Reproduce

On the FRDM-K64F board (https://docs.zephyrproject.org/latest/boards/arm/frdm_k64f/doc/index.html), he is able to extract timestamps from the gPTP system with nanosecond resolution. But on the RT1020EVK (https://docs.zephyrproject.org/latest/boards/arm/mimxrt1020_evk/doc/index.html), using the same code, with the same libraries and tools, the nanosecond member of the net_ptp_time struct is NOT being filled, and therefore we are only able able to read the PTP Hardware Clock with a resolution of seconds. Reproducing the issue is as simple as adding a few extra LOG_INF() calls to the gptp demo (see below), compiling it for the RT1020-EVK, flashing it, and running it.

Lightly modified demo code

/∗USER BEGIN INCLUDES∗/
#include <net/ptptime.h>
#include <sys/printk.h>
#include <sys/util.h>
/∗USER END INCLUDES∗/
/∗USER BEGIN VARIABLES∗/
static struct net_ptp_time slave_time;
//struct gptp_clk_src_time_invokeparams src_time_invoke_parameters;
bool gm present ;
int status ;
/∗USER END VARIABLES∗/
void main ( v o i d )
{
  /∗ USER BEGIN MAIN. C∗/
  while (1) {
  status=gptp_event_capture(&slave_time, &gm_present) ;
  LOG_INF( ” ” ) ;
  LOG_INF( ” ” ) ;
  LOG_INF( ” Standard info plot:” ) ;
  LOG_INF( ”gPTP event capture is %i ” , status ) ; // 0 es NO ERROR
  LOG_INF( ”gPTP time second %u” , slave_time.second ) ;
  LOG_INF( ” ” ) ;
  LOG_INF( ” Plot slave time SECONDS: ” ) ;
  LOG_INF( ”gPTP slave time second(u) %u” , slave_time.second ) ;
  LOG_INF( ”gPTP slave time second (X) 0x %X” , slave_time.second ) ;
  LOG_INF( ” ” ) ;
  LOG_INF( ”Plot slave time NANOSECONDS: ” ) ;
  LOG_INF( ”gPTP slave time nanosecond (u) %u” , slave_time.nanosecond ) ;
  LOG_INF( ”gPTP slave time nanosecond (X) 0x %X” , slave_time.nanosecond ) ;
  LOG_INF( ” ” ) ;
  LOG_INF( ” slave_time.second address : 0x %X” , &( slave_time.second ) ) ;
  LOG_INF( ” slave_time.nanosecond address : 0x %X” , &( slave_time.nanosecond ) ) ;
  kmsleep(1000) ; // sleep time in ms
  }
  /∗ USER END MAIN. C∗/
}

Build

$ west build -b mimxrt1020 evk samples/net/gptp/
$ west flash

Run

[00:04:59.182,000] <inf> net_gptp_sample: 
[00:04:59.182,000] <inf> net_gptp_sample: Standard info plot:
[00:04:59.182,000] <inf> net_gptp_sample: gPTP event capture is 0
[00:04:59.182,000] <inf> net_gptp_sample: gPTP time second 1614051136
[00:04:59.182,000] <inf> net_gptp_sample: 
[00:04:59.182,000] <inf> net_gptp_sample: Plot slave time SECONDS:
[00:04:59.182,000] <inf> net_gptp_sample: gPTP slave time second (u) 1614051136
[00:04:59.182,000] <inf> net_gptp_sample: gPTP slave time second (X) 0x60347740
[00:04:59.182,000] <inf> net_gptp_sample: 
[00:04:59.182,000] <inf> net_gptp_sample: Plot slave time NANOSECONDS:
[00:04:59.182,000] <inf> net_gptp_sample: gPTP slave time nanosecond (u) 0
[00:04:59.182,000] <inf> net_gptp_sample: gPTP slave time nanosecond (X) 0x0
[00:04:59.182,000] <inf> net_gptp_sample: 
[00:04:59.182,000] <inf> net_gptp_sample: slave_time.second address: 0x80001DC0
[00:04:59.182,000] <inf> net_gptp_sample: slave_time.second address: 0x80001DC8
uart:~$

Expected behavior

The expected behavior is what happens running the same code on the FRDM-k64f:

[00:05:27.167,000] <inf> net_gptp_sample: 
[00:05:27.167,000] <inf> net_gptp_sample: Standard info plot:
[00:05:27.167,000] <inf> net_gptp_sample: gPTP event capture is 0
[00:05:27.167,000] <inf> net_gptp_sample: gPTP time second 1614072200
[00:05:27.167,000] <inf> net_gptp_sample: 
[00:05:27.167,000] <inf> net_gptp_sample: Plot slave time SECONDS:
[00:05:27.167,000] <inf> net_gptp_sample: gPTP slave time second (u) 1614072200
[00:05:27.167,000] <inf> net_gptp_sample: gPTP slave time second (X) 0x6034C988
[00:05:27.167,000] <inf> net_gptp_sample: 
[00:05:27.167,000] <inf> net_gptp_sample: Plot slave time NANOSECONDS:
[00:05:27.167,000] <inf> net_gptp_sample: gPTP slave time nanosecond (u) 906384492
[00:05:27.167,000] <inf> net_gptp_sample: gPTP slave time nanosecond (X) 0x3605646C
[00:05:27.167,000] <inf> net_gptp_sample: 
[00:05:27.167,000] <inf> net_gptp_sample: slave_time.second address: 0x20001DC0
[00:05:27.167,000] <inf> net_gptp_sample: slave_time.second address: 0x20001DC8
uart:~$

Note that on the FRDM-k64f, the nanosecond member IS being updated.

Impact We are unable to proceed with the development of the time synchronization element of our new sensor solution because of this issue. Santi is attempting to make equivalent functionality work in FreeRTOS to continue with his academic pursuits, but this is not a great solution for us, because other team members are developing other components of the sensor firmware in Zephyr. We the just getting started with Zephyr; we like what we see so far, but timestamp synchronization with gPTP is at the very heart of what we are trying to do with our sensors--if we can not make it work, we will need to change our strategy. We reached out to NXP through our distributor EBV, and they have confirmed that the gPTP stack and demo should work in Zephyr on the RT1020-EVK. They encouraged us to report the issue here.

GDB / console output

Using the gdb debugger, Santi isolated the differences in the progress of the call to gptp_event_capture to a single line of code that differs between the FRDM-K64F and the MIMXRT1020-EVK. The relevant part of his (attached) document describing the problem follows:

GDB debugging

To investigate the differences between what is happening on the FRDM-K64F and the MIMXRT1020-EVK, the GDB debugger was used. The following was executed from the Zephyr directory:

$ west debug

We enter into "debug mode"

(gdb) layout src
(gdb) advance gptp event capture

A position is set in the moment just before the execution of

status=gptp event capture(&slave time, &gm present)

At this instant,

(gdb) print slave time.second
(gdb) print slave time.nanoecond

Both return 0. We now step through to see execution of gptp_event_capture()

(gdb) step

Before the execution finishes, and analyzing the output of (gdb) step, it is seen that execution passes through the following files and in this order:

/include/arch/arm/aarch32/asm inline gcc.h line 56
/subsys/net/l2/ethernet/gptp/gptp user api.c line
/subsys/net/l2/ethernet/gptp/gptp user api.c line 62 
    #### [ The 1020EVK does NOT pass through line 62, unlike the FRDM-K64F ]
/subsys/net/l2/ethernet/gptp/gptp user api.c line 64
/subsys/net/l2/ethernet/gptp/gptp user api.c line 66
/subsys/net/l2/ethernet/ethernet.c line 1059
/include/net/net if.h line 589
/subsys/net/l2/ethernet/ethernet.c line 1060
/subsys/net/l2/ethernet/ethernet.c line 1062
/include/net/net if.h line 555
/subsys/net/l2/ethernet/ethernet.c line 1070
/drivers/ethernet/eth mcux.c line 1083
/subsys/net/l2/ethernet/ethernet.c line 1074
/include/net/net if.h line 1078
/include/net/net if.h line 589
/modules/hal/cmsis/CMSIS/Core/Include/cmsis gcc.h line 453
/modules/hal/nxp/mcux/drivers/imx/fsl common.h line 566
/modules/hal/cmsis/CMSIS/Core/Include/cmsis gcc.h l´ınea 209
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2857
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2826
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2828
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2833 - 2835
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2838
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2860 - 2862
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2866
/modules/hal/cmsis/CMSIS/Core/Include/cmsis gcc.h line 481
/drivers/ethernet/eth mcux.c line 1439
/drivers/ethernet/eth mcux.c line 1440
/drivers/ethernet/eth mcux.c line 1441
/subsys/net/l2/ethernet/gptp/gptp user api.c line 69
/include/arch/arm/aarch32/asm inline gcc.h line 95
/subsys/net/l2/ethernet/gptp/gptp user api.c line 70
/samples/net/gptp/src/main.c

Finally,

(gdb) print slave time.second
(gdb) print slave time.nanoecond

Now, in the display of the time.seconds member, a correct value is seen, while the time.nanoseconds member still reads 0: debug_1020

In conclusion, we are seeing the desired behavior on the FRDM-K64F; we are able to read timestamps down to the nanosecond. The same application code, making the same api calls on the RT1020 is not returning nanoseconds, however, and it all seems to come down to line 62 in gptp_user_api.c

Environment:

OS: Linux
Toolchain: Zephyr SDK 0.12.2, West v0.9.0, cmake 3.16.3, pip 20.0.2, GDB
Commit SHA: 2857c2e9845a328e2bee2be9005071e46db8b8bb

Zephyr6.pdf

jukkar commented 3 years ago

/subsys/net/l2/ethernet/gptp/gptp user api.c line 62

[ The 1020EVK does NOT pass through line 62, unlike the FRDM-K64F ]

/subsys/net/l2/ethernet/gptp/gptp user api.c line 64

What does the "NOT pass" mean here? The line 62 is this one

    for (port = GPTP_PORT_START; port <= GPTP_PORT_END; port++) {

Do you mean that the in 1020EVK, the for loop is not entered? ~That would mean that GPTP_PORT_END which is (gptp_domain.default_ds.nb_ports + 1) which means that nb_ports variable is 0 which means that the system was not able to find any ptp clock in the system. That would indicate some issue in mcux driver if this is really the case.~ Edit: actually, because how the for loop is written <=, we should always enter the loop.

The gptp subsystem is not board specific so this indicates some issue in mcux driver and/or NXP HAL.

kevin137 commented 3 years ago

My interpretation was that the GPTP_PORT_END condition was not allowing the loop to be entered even one time, so it was getting optimized out by the compiler. Could it be that is a HAL issue then? We also have access to a couple of RT1050 EVKs, though they are in a different office. Would it be helpful to run the same test on a 1050?

jukkar commented 3 years ago

Would it be helpful to run the same test on a 1050?

Yes, that would be useful information to know if there are issues in 1050 too.

MaureenHelm commented 3 years ago

Could it be that is a HAL issue then?

i.MX RT and Kinetis use the same HAL driver (fsl_enet.c).

Would it be helpful to run the same test on a 1050?

Yes, please.

@hakehuang can you have a look?

hakehuang commented 3 years ago

@kevin137 do you follow the document in gptp sample document to set up the environment I find an issue on frdm_k64f and can reproduce it on mimxrt1060_evk, #29599

can you try the SDK ptp1588 example which can be download from kex.nxp.com, the log below shows that the board is ok. note: you need make a local loop back RT45 cable connected with board. see this diagram

besides, do you add the board setting to mimxrt1020_evk, in gpt sample as below

CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y

Get the 1-th time 3 second, 813707720 nanosecond
 Get the 2-th time 3 second, 891518500 nanosecond
 Get the 3-th time 3 second, 969332420 nanosecond
 Get the 4-th time 4 second, 47146660 nanosecond
 Get the 5-th time 4 second, 124916860 nanosecond
 Get the 6-th time 4 second, 202730800 nanosecond
 Get the 7-th time 4 second, 280544740 nanosecond
 Get the 8-th time 4 second, 358358700 nanosecond
 Get the 9-th time 4 second, 436172620 nanosecond
 Get the 10-th time 4 second, 513986560 nanosecond
The 1 frame transmitted success! the timestamp is 4 second, 591861900 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 591862760 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 2 frame transmitted success! the timestamp is 4 second, 602166460 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 602167320 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 3 frame transmitted success! the timestamp is 4 second, 612474060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 612474920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 4 frame transmitted success! the timestamp is 4 second, 622781860 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 622782720 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 5 frame transmitted success! the timestamp is 4 second, 633089660 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 633090520 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 6 frame transmitted success! the timestamp is 4 second, 643397260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 643398120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 7 frame transmitted success! the timestamp is 4 second, 653705060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 653705920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 8 frame transmitted success! the timestamp is 4 second, 664012660 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 664013520 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 9 frame transmitted success! the timestamp is 4 second, 674320460 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 674321320 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 10 frame transmitted success! the timestamp is 4 second, 684628260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 684629120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 11 frame transmitted success! the timestamp is 4 second, 694979260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 694980120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 12 frame transmitted success! the timestamp is 4 second, 705330460 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 705331320 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 13 frame transmitted success! the timestamp is 4 second, 715681660 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 715682520 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 14 frame transmitted success! the timestamp is 4 second, 726032860 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 726033720 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 15 frame transmitted success! the timestamp is 4 second, 736384060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 736384920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 16 frame transmitted success! the timestamp is 4 second, 746735060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 746735920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 17 frame transmitted success! the timestamp is 4 second, 757086260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 757087120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 18 frame transmitted success! the timestamp is 4 second, 767437500 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 767438360 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 19 frame transmitted success! the timestamp is 4 second, 777788700 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 777789560 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 20 frame transmitted success! the timestamp is 4 second, 788139860 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 788140720 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60

kevin137 commented 3 years ago

Could it be that is a HAL issue then?

i.MX RT and Kinetis use the same HAL driver (fsl_enet.c).

Would it be helpful to run the same test on a 1050?

Yes, please.

@hakehuang can you have a look?

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

hakehuang commented 3 years ago

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?

CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y

santi681 commented 3 years ago

@kevin137 do you follow the document in gptp sample document to set up the environment I find an issue on frdm_k64f and can reproduce it on mimxrt1060_evk, #29599

besides, do you add the board setting to mimxrt1020_evk, in gpt sample as below

CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y

Get the 1-th time 3 second, 813707720 nanosecond
 Get the 2-th time 3 second, 891518500 nanosecond
 Get the 3-th time 3 second, 969332420 nanosecond
 Get the 4-th time 4 second, 47146660 nanosecond
 Get the 5-th time 4 second, 124916860 nanosecond
 Get the 6-th time 4 second, 202730800 nanosecond
 Get the 7-th time 4 second, 280544740 nanosecond
 Get the 8-th time 4 second, 358358700 nanosecond
 Get the 9-th time 4 second, 436172620 nanosecond
 Get the 10-th time 4 second, 513986560 nanosecond
The 1 frame transmitted success! the timestamp is 4 second, 591861900 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 591862760 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 2 frame transmitted success! the timestamp is 4 second, 602166460 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 602167320 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 3 frame transmitted success! the timestamp is 4 second, 612474060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 612474920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 4 frame transmitted success! the timestamp is 4 second, 622781860 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 622782720 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 5 frame transmitted success! the timestamp is 4 second, 633089660 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 633090520 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 6 frame transmitted success! the timestamp is 4 second, 643397260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 643398120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 7 frame transmitted success! the timestamp is 4 second, 653705060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 653705920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 8 frame transmitted success! the timestamp is 4 second, 664012660 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 664013520 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 9 frame transmitted success! the timestamp is 4 second, 674320460 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 674321320 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 10 frame transmitted success! the timestamp is 4 second, 684628260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 684629120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 11 frame transmitted success! the timestamp is 4 second, 694979260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 694980120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 12 frame transmitted success! the timestamp is 4 second, 705330460 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 705331320 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 13 frame transmitted success! the timestamp is 4 second, 715681660 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 715682520 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 14 frame transmitted success! the timestamp is 4 second, 726032860 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 726033720 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 15 frame transmitted success! the timestamp is 4 second, 736384060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 736384920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 16 frame transmitted success! the timestamp is 4 second, 746735060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 746735920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 17 frame transmitted success! the timestamp is 4 second, 757086260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 757087120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 18 frame transmitted success! the timestamp is 4 second, 767437500 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 767438360 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 19 frame transmitted success! the timestamp is 4 second, 777788700 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 777789560 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 20 frame transmitted success! the timestamp is 4 second, 788139860 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 788140720 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60

Yes, SDK ptp1588 example works as you mentioned for MIMXRT1020 and FRDM-k64f, although the PTP timestamp expected would be the same as in the timestamp in the master node (a Linux ptp4l implementation) as in Zephyr OS, right? Or why is the timestamp beginning at 0.0 seconds? Is the MIMXRT1020 or FRDM-k64f working as a master node? Is it possible to change that?

santi681 commented 3 years ago

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?
CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y

Hi @hakehuang , this configuration file for frdm_k64f can be easily found at zephyrproject/zephyr/samples/net/gptp/board , but where is the file related to the mimxrt1020_evk configuration? Where do we must add these config parameters for mimxrt1020_evk? Thanks in advance!

hakehuang commented 3 years ago

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?
CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y
Hi @hakehuang , this configuration file for frdm_k64f can be easily found at zephyrproject/zephyr/samples/net/gptp/board , but where is the file related to the mimxrt1020_evk configuration? Where do we must add these config parameters for mimxrt1020_evk? Thanks in advance!

@santi681 , you can look the the eth_mcux.c code and find the CONFIG_PTP_CLOCK_MCUX, this is the option to enalbe ptp time stamp.

hakehuang commented 3 years ago

Yes, SDK ptp1588 example works as you mentioned for MIMXRT1020 and FRDM-k64f, although the PTP timestamp expected would be the same as in the timestamp in the master node (a Linux ptp4l implementation) as in Zephyr OS, right? Or why is the timestamp beginning at 0.0 seconds? Is the MIMXRT1020 or FRDM-k64f working as a master node? Is it possible to change that?

the SDK exmaple only to use to check whether there is a board issue. it only examples to add timestamp to udp package, which is not a ptp example. In zephyr gptp case, according to the gptp protocal, when you connect two ptp instance, they will send out pdelay_req and then choose to be master or client, then mast starts to send sync packet. I do not have a workable linux host as ptp master. I tried to connect two boards, but it looks buggy as I reported in #29599

santi681 commented 3 years ago

Yes, SDK ptp1588 example works as you mentioned for MIMXRT1020 and FRDM-k64f, although the PTP timestamp expected would be the same as in the timestamp in the master node (a Linux ptp4l implementation) as in Zephyr OS, right? Or why is the timestamp beginning at 0.0 seconds? Is the MIMXRT1020 or FRDM-k64f working as a master node? Is it possible to change that?

the SDK exmaple only to use to check whether there is a board issue. it only examples to add timestamp to udp package, which is not a ptp example. In zephyr gptp case, according to the gptp protocal, when you connect two ptp instance, they will send out pdelay_req and then choose to be master or client, then mast starts to send sync packet. I do not have a workable linux host as ptp master. I tried to connect two boards, but it looks buggy as I reported in #29599

Clear with that, thanks @hakehuang !

santi681 commented 3 years ago

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?
CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y
Hi @hakehuang , this configuration file for frdm_k64f can be easily found at zephyrproject/zephyr/samples/net/gptp/board , but where is the file related to the mimxrt1020_evk configuration? Where do we must add these config parameters for mimxrt1020_evk? Thanks in advance!
@santi681 , you can look the the eth_mcux.c code and find the CONFIG_PTP_CLOCK_MCUX, this is the option to enalbe ptp time stamp.

Thanks for your replies @hakehuang ! I simply defined these parameters at the beginning of the eth_mcux.c code if this is what I was supposed to do. Defining these config parameters doesn't make any difference to the results. In fact PTP seems to be working properly because slave_time.second are properly synchronized, the problem still remains on getting the slave_time.nanosecond, which is still zero at any time.

jukkar commented 3 years ago

I simply defined these parameters at the beginning of the eth_mcux.c code if this is what I was supposed to do

You should not do that, but set the options in your applications prj.conf file. Some other source file might use these options too in which case they would not be set in all the needed places.

hakehuang commented 3 years ago

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?
CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y
Hi @hakehuang , this configuration file for frdm_k64f can be easily found at zephyrproject/zephyr/samples/net/gptp/board , but where is the file related to the mimxrt1020_evk configuration? Where do we must add these config parameters for mimxrt1020_evk? Thanks in advance!
@santi681 , you can look the the eth_mcux.c code and find the CONFIG_PTP_CLOCK_MCUX, this is the option to enalbe ptp time stamp.
Thanks for your replies @hakehuang ! I simply defined these parameters at the beginning of the eth_mcux.c code if this is what I was supposed to do. Defining these config parameters doesn't make any difference to the results. In fact PTP seems to be working properly because slave_time.second are properly synchronized, the problem still remains on getting the slave_time.nanosecond, which is still zero at any time.

the main missing here is in the HAL driver see the modules/hal/nxp/mcux/drivers/imx/CMakeLists.txt

zephyr_compile_definitions_ifdef(
  CONFIG_PTP_CLOCK_MCUX
  ENET_ENHANCEDBUFFERDESCRIPTOR_MODE
)

this enables the HAL driver to output ptp time stamp.

santi681 commented 3 years ago

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?
CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y
Hi @hakehuang , this configuration file for frdm_k64f can be easily found at zephyrproject/zephyr/samples/net/gptp/board , but where is the file related to the mimxrt1020_evk configuration? Where do we must add these config parameters for mimxrt1020_evk? Thanks in advance!
@santi681 , you can look the the eth_mcux.c code and find the CONFIG_PTP_CLOCK_MCUX, this is the option to enalbe ptp time stamp.
Thanks for your replies @hakehuang ! I simply defined these parameters at the beginning of the eth_mcux.c code if this is what I was supposed to do. Defining these config parameters doesn't make any difference to the results. In fact PTP seems to be working properly because slave_time.second are properly synchronized, the problem still remains on getting the slave_time.nanosecond, which is still zero at any time.
the main missing here is in the HAL driver see the modules/hal/nxp/mcux/drivers/imx/CMakeLists.txt
zephyr_compile_definitions_ifdef(
  CONFIG_PTP_CLOCK_MCUX
  ENET_ENHANCEDBUFFERDESCRIPTOR_MODE
)
this enables the HAL driver to output ptp time stamp.

CMakeLists.txt file:

#
# Copyright (c) 2018, NXP
#
# SPDX-License-Identifier: Apache-2.0
#

zephyr_include_directories(.)

zephyr_compile_definitions_ifdef(
  CONFIG_PTP_CLOCK_MCUX
  ENET_ENHANCEDBUFFERDESCRIPTOR_MODE
)

zephyr_library_compile_definitions_ifdef(
  CONFIG_HAS_MCUX_CACHE FSL_SDK_ENABLE_DRIVER_CACHE_CONTROL
)

zephyr_library_sources_ifdef(CONFIG_CAN_MCUX_FLEXCAN    fsl_flexcan.c)
zephyr_library_sources_ifdef(CONFIG_COUNTER_MCUX_GPT    fsl_gpt.c)
zephyr_library_sources_ifdef(CONFIG_COUNTER_MCUX_PIT    fsl_pit.c)
zephyr_library_sources_ifdef(CONFIG_DISPLAY_MCUX_ELCDIF fsl_elcdif.c)
zephyr_library_sources_ifdef(CONFIG_DMA_MCUX_EDMA   fsl_edma.c)
zephyr_library_sources_ifdef(CONFIG_DMA_MCUX_EDMA   fsl_dmamux.c)
zephyr_library_sources_ifdef(CONFIG_ENTROPY_MCUX_TRNG   fsl_trng.c)
zephyr_library_sources_ifdef(CONFIG_ETH_MCUX        fsl_enet.c)
zephyr_library_sources_ifdef(CONFIG_FLASH_MCUX_FLEXSPI  fsl_flexspi.c)
zephyr_library_sources_ifdef(CONFIG_GPIO_MCUX_IGPIO fsl_gpio.c)
zephyr_library_sources_ifdef(CONFIG_HAS_MCUX_CACHE  fsl_cache.c)
zephyr_library_sources_ifdef(CONFIG_I2C_MCUX_LPI2C  fsl_lpi2c.c)
zephyr_library_sources_ifdef(CONFIG_I2S_MCUX_SAI    fsl_sai.c)
zephyr_library_sources_ifdef(CONFIG_I2S_MCUX_SAI    fsl_sai_edma.c)
zephyr_library_sources_ifdef(CONFIG_PWM_MCUX        fsl_pwm.c)
zephyr_library_sources_ifdef(CONFIG_SPI_MCUX_LPSPI  fsl_lpspi.c)
zephyr_library_sources_ifdef(CONFIG_UART_MCUX_LPUART    fsl_lpuart.c)
zephyr_library_sources_ifdef(CONFIG_VIDEO_MCUX_CSI  fsl_csi.c)
zephyr_library_sources_ifdef(CONFIG_WDT_MCUX_IMX_WDOG    fsl_wdog.c)
zephyr_library_sources_ifdef(CONFIG_CAN_MCUX_FLEXCAN    fsl_flexcan.c)

if(NOT CONFIG_ASSERT OR CONFIG_FORCE_NO_ASSERT)
  zephyr_compile_definitions(NDEBUG) # squelch fsl_flexcan.c warning
endif()

I can see these config parameters are already defined. Does it make sense to be able to get the seconds and not the nanosecond? Are the requirements for both parameters the same?

hakehuang commented 3 years ago

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?
CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y
Hi @hakehuang , this configuration file for frdm_k64f can be easily found at zephyrproject/zephyr/samples/net/gptp/board , but where is the file related to the mimxrt1020_evk configuration? Where do we must add these config parameters for mimxrt1020_evk? Thanks in advance!
@santi681 , you can look the the eth_mcux.c code and find the CONFIG_PTP_CLOCK_MCUX, this is the option to enalbe ptp time stamp.
Thanks for your replies @hakehuang ! I simply defined these parameters at the beginning of the eth_mcux.c code if this is what I was supposed to do. Defining these config parameters doesn't make any difference to the results. In fact PTP seems to be working properly because slave_time.second are properly synchronized, the problem still remains on getting the slave_time.nanosecond, which is still zero at any time.
the main missing here is in the HAL driver see the modules/hal/nxp/mcux/drivers/imx/CMakeLists.txt
zephyr_compile_definitions_ifdef(
  CONFIG_PTP_CLOCK_MCUX
  ENET_ENHANCEDBUFFERDESCRIPTOR_MODE
)
this enables the HAL driver to output ptp time stamp.
CMakeLists.txt file:
#
# Copyright (c) 2018, NXP
#
# SPDX-License-Identifier: Apache-2.0
#

zephyr_include_directories(.)

zephyr_compile_definitions_ifdef(
  CONFIG_PTP_CLOCK_MCUX
  ENET_ENHANCEDBUFFERDESCRIPTOR_MODE
)

zephyr_library_compile_definitions_ifdef(
  CONFIG_HAS_MCUX_CACHE FSL_SDK_ENABLE_DRIVER_CACHE_CONTROL
)

zephyr_library_sources_ifdef(CONFIG_CAN_MCUX_FLEXCAN  fsl_flexcan.c)
zephyr_library_sources_ifdef(CONFIG_COUNTER_MCUX_GPT  fsl_gpt.c)
zephyr_library_sources_ifdef(CONFIG_COUNTER_MCUX_PIT  fsl_pit.c)
zephyr_library_sources_ifdef(CONFIG_DISPLAY_MCUX_ELCDIF   fsl_elcdif.c)
zephyr_library_sources_ifdef(CONFIG_DMA_MCUX_EDMA fsl_edma.c)
zephyr_library_sources_ifdef(CONFIG_DMA_MCUX_EDMA fsl_dmamux.c)
zephyr_library_sources_ifdef(CONFIG_ENTROPY_MCUX_TRNG fsl_trng.c)
zephyr_library_sources_ifdef(CONFIG_ETH_MCUX      fsl_enet.c)
zephyr_library_sources_ifdef(CONFIG_FLASH_MCUX_FLEXSPI    fsl_flexspi.c)
zephyr_library_sources_ifdef(CONFIG_GPIO_MCUX_IGPIO   fsl_gpio.c)
zephyr_library_sources_ifdef(CONFIG_HAS_MCUX_CACHE    fsl_cache.c)
zephyr_library_sources_ifdef(CONFIG_I2C_MCUX_LPI2C    fsl_lpi2c.c)
zephyr_library_sources_ifdef(CONFIG_I2S_MCUX_SAI  fsl_sai.c)
zephyr_library_sources_ifdef(CONFIG_I2S_MCUX_SAI  fsl_sai_edma.c)
zephyr_library_sources_ifdef(CONFIG_PWM_MCUX      fsl_pwm.c)
zephyr_library_sources_ifdef(CONFIG_SPI_MCUX_LPSPI    fsl_lpspi.c)
zephyr_library_sources_ifdef(CONFIG_UART_MCUX_LPUART  fsl_lpuart.c)
zephyr_library_sources_ifdef(CONFIG_VIDEO_MCUX_CSI    fsl_csi.c)
zephyr_library_sources_ifdef(CONFIG_WDT_MCUX_IMX_WDOG    fsl_wdog.c)
zephyr_library_sources_ifdef(CONFIG_CAN_MCUX_FLEXCAN  fsl_flexcan.c)

if(NOT CONFIG_ASSERT OR CONFIG_FORCE_NO_ASSERT)
  zephyr_compile_definitions(NDEBUG) # squelch fsl_flexcan.c warning
endif()
I can see these config parameters are already defined. Does it make sense to be able to get the seconds and not the nanosecond? Are the requirements for both parameters the same?

it depends on CONFIG_PTP_CLOCK_MCUX, so you need define it in boards just like frdm_k64f does, and this in HAL driver enables the hardware ptp timestamp

kevin137 commented 3 years ago

Hello everyone. I wanted to give a quick update and confirm that we see the same problem on the RT1050EVK. I believe we are configuring prj.conf correctly and have also confirmed that the HAL driver is being referenced. We are preparing a complete package of code, working on the frdm_k64f, NOT working on 1020 and 1050, to be able to share here.

hakehuang commented 3 years ago

Hello everyone. I wanted to give a quick update and confirm that we see the same problem on the RT1050EVK. I believe we are configuring prj.conf correctly and have also confirmed that the HAL driver is being referenced. We are preparing a complete package of code, working on the frdm_k64f, NOT working on 1020 and 1050, to be able to share here.

Thanks kevin. please share your code.

below are my debug log on rt1060 which I think should be same

Breakpoint 2, ENET_Ptp1588GetTimer (base=0x402d8000, handle=handle@entry=0x80000090 <eth0_context+16>, ptpTime=ptpTime@entry=0x8000aca0 <z_interrupt_stacks+1920>)
    at /home/shared/disk/zephyr_project/zephyr_rt1060/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:453
453       __ASM volatile ("MRS %0, primask" : "=r" (result) );
(gdb) n
2857        ENET_Ptp1588GetTimerNoIrqDisable(base, handle, ptpTime);
(gdb)
2860        if (0U != (base->EIR & (uint32_t)kENET_TsTimerInterrupt))
(gdb)
2866        EnableGlobalIRQ(primask);
(gdb)
eth_rx (context=0x80000080 <eth0_context>) at /home/shared/disk/zephyr_project/zephyr_rt1060/zephyr/drivers/ethernet/eth_mcux.c:799
799                     if (ptpTimeData.nanosecond < ts) {
(gdb) print ts
$6 = 2863311530
(gdb) print ptpTimeData.nanosecond
$7 = 2863311530
(gdb) n
803                     pkt->timestamp.nanosecond = ptpTimeData.nanosecond;
(gdb) print ptpTimeData.nanosecond
$8 = 2863311530
(gdb) n
804                     pkt->timestamp.second = ptpTimeData.second;
(gdb) p pkt->timestamp.nanosecond
$9 = 17367042
(gdb) p
$10 = 17367042

aunsbjerg commented 3 years ago

I am seeing the same issue when running the gPTP sample on the mimxrt1064_evk board.

kevin137 commented 3 years ago

We have some new information. Unfortunately, due to continuing logistical difficulties we still don't have the K64-FRDM at our location, and we are not able to run the code with the latest modifications on it. We ARE able to run the same code on the mimxrt1050_evk and mimxrt1020_evk, however, and we see the same behavior as mentioned in all the previous posts. Beyond this, we are starting to looking at the guts of zephyr/modules/hal/nxp/mcux/drivers/imx/fsl_enet.c, and we think we see some things that aren't working as they should be.

First, a couple of links to our code in github: --The application (basically the samples/net/gptp demo with some LOG_INFs ): https://github.com/ainguraXmarquiegui/zephyr.git --The HAL (likewise, our intention is to just see values of registers): https://github.com/ainguraXmarquiegui/hal_nxp.git

The general issue continues. When the EVK running Zephyr as a slave syncs with our PC running Avnu gptp, we see the seconds field pop into the correct values, but the nanoseconds field is stuck at 0.

Recently, we have started to extract data from what we believe to be the source of the nanosecond field, the ENET_ATVR register (see page 2155 of the i.MX RT1050 Processor Reference Manual, Rev. 4 12/2019). We do this in fsl_enet.c (see https://github.com/ainguraXmarquiegui/hal_nxp/commit/02947be48301a0157c8f76d23462b59595e10645 ), and print it in our application. The exact changes can be seen here: https://github.com/zephyrproject-rtos/hal_nxp/compare/master...ainguraXmarquiegui:feature/aingura_gptp_tests and https://github.com/zephyrproject-rtos/zephyr/compare/master...ainguraXmarquiegui:feature/aingura_gptp_tests

With this code compiled and running, this is what we see on the console of both the 1020EVK and 1050EVK:

[00:12:21.351,000] <inf> net_gptp_sample: Plot slave time SECONDS:                                                                                                                                                                           
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time second (u) 1619300568                                                                                                                                                              
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time second (X) 0x608490D8                                                                                                                                                              
[00:12:21.351,000] <inf> net_gptp_sample:                                                                                                                                                                                                    
[00:12:21.351,000] <inf> net_gptp_sample: Plot slave time NANOSECONDS:                                                                                                                                                                       
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time nanosecond (u) 0                                                                                                                                                                   
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time nanosecond (X) 0x0                                                                                                                                                                 
[00:12:21.351,000] <inf> net_gptp_sample: ATCRval! (X) 0xA91                                                                                                                                                                                 
[00:12:21.351,000] <inf> net_gptp_sample: ATCRaddr! (X) 0x402D8400                                                                                                                                                                           
[00:12:21.351,000] <inf> net_gptp_sample: ATVRval! (X) 0x0                                                                                                                                                                                   
[00:12:21.351,000] <inf> net_gptp_sample: ATVRaddr! (X) 0x402D8404

As you can see, the ATVR register address is correct. The ATVR register value, however, is always reading 0x0.

We believe this may be caused by the ATCR Reset Timer bit, which we should see getting automatically cleared. We are always reading 0xA91, but think that we should be seeing an 0x8 instead of 0xA.

hakehuang commented 3 years ago

We have some new information. Unfortunately, due to continuing logistical difficulties we still don't have the K64-FRDM at our location, and we are not able to run the code with the latest modifications on it. We ARE able to run the same code on the mimxrt1050_evk and mimxrt1020_evk, however, and we see the same behavior as mentioned in all the previous posts. Beyond this, we are starting to looking at the guts of zephyr/modules/hal/nxp/mcux/drivers/imx/fsl_enet.c, and we think we see some things that aren't working as they should be.

First, a couple of links to our code in github: --The application (basically the samples/net/gptp demo with some LOG_INFs ): https://github.com/ainguraXmarquiegui/zephyr.git --The HAL (likewise, our intention is to just see values of registers): https://github.com/ainguraXmarquiegui/hal_nxp.git

The general issue continues. When the EVK running Zephyr as a slave syncs with our PC running Avnu gptp, we see the seconds field pop into the correct values, but the nanoseconds field is stuck at 0.

Recently, we have started to extract data from what we believe to be the source of the nanosecond field, the ENET_ATVR register (see page 2155 of the i.MX RT1050 Processor Reference Manual, Rev. 4 12/2019). We do this in fsl_enet.c (see ainguraXmarquiegui/hal_nxp@02947be ), and print it in our application. The exact changes can be seen here: zephyrproject-rtos/hal_nxp@master...ainguraXmarquiegui:feature/aingura_gptp_tests and master...ainguraXmarquiegui:feature/aingura_gptp_tests

With this code compiled and running, this is what we see on the console of both the 1020EVK and 1050EVK:
[00:12:21.351,000] <inf> net_gptp_sample: Plot slave time SECONDS:                                                                                                                                                                           
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time second (u) 1619300568                                                                                                                                                              
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time second (X) 0x608490D8                                                                                                                                                              
[00:12:21.351,000] <inf> net_gptp_sample:                                                                                                                                                                                                    
[00:12:21.351,000] <inf> net_gptp_sample: Plot slave time NANOSECONDS:                                                                                                                                                                       
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time nanosecond (u) 0                                                                                                                                                                   
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time nanosecond (X) 0x0                                                                                                                                                                 
[00:12:21.351,000] <inf> net_gptp_sample: ATCRval! (X) 0xA91                                                                                                                                                                                 
[00:12:21.351,000] <inf> net_gptp_sample: ATCRaddr! (X) 0x402D8400                                                                                                                                                                           
[00:12:21.351,000] <inf> net_gptp_sample: ATVRval! (X) 0x0                                                                                                                                                                                   
[00:12:21.351,000] <inf> net_gptp_sample: ATVRaddr! (X) 0x402D8404
As you can see, the ATVR register address is correct. The ATVR register value, however, is always reading 0x0.

We believe this may be caused by the ATCR Reset Timer bit, which we should see getting automatically cleared. We are always reading 0xA91, but think that we should be seeing an 0x8 instead of 0xA.

I check your https://github.com/ainguraXmarquiegui/zephyr/tree/feature/aingura_gptp_tests but I did not see you modify the west.yml to direct your hal code. how do you do that? do you run west update before your build?

kevin137 commented 3 years ago

Yes. We ran west update initially, and we are not changing between platforms, it is always either RT1050 or RT1020. My colleague @ainguraXmarquiegui ran west update just before creating the fork.

hakehuang commented 3 years ago

can you have a picture on your real boards connection? and if possible can I have your linux host boards shipped to me?

kevin137 commented 3 years ago

Hello @hakehuang and everyone listening in,

To anyone trying our code, please do a fresh checkout of the latest commit at:

https://github.com/ainguraXmarquiegui/zephyr/tree/feature/aingura_gptp_tests and https://github.com/ainguraXmarquiegui/hal_nxp/tree/feature/aingura_gptp_tests

Thanks to our friends at EBV, we got our hands on a FRDM-K64F and were able to run the same code on all three boards, and have confirmed that the FRDM works as expected, and the i.MXRT boards do not--and we are quite sure that it is related to the Reset Timer bit in the ATCR register.

First, as requested, this is the test setup, including are how the cables are hooked up:

FRDM-K64F, connected via USB to my PC (via the SDA USB port for the console UART), and with an Ethernet connection directly from the board to the Ethernet interface of my PC:

photo_2021-04-28_17-10-56

MIMXRT1020-EVK, connected via USB to my PC (via the J13 USB port for the console UART), and with an Ethernet connection directly from the board to the Ethernet interface of my PC:

photo_2021-04-28_17-11-08

MIMXRT1050-EVKB, connected via USB to my PC (via the J28 USB port for the console UART), and with an Ethernet connection directly from the board to the Ethernet interface of my PC:

photo_2021-04-28_17-11-21

My PC is nothing special. It does have an Intel I219-LM LOM Ethernet interface with hardware timestamping, which is very useful for gPTP, but most corporate PCs with NICs from Intel or Broadcom should have this capability:

kcook@i9coffee:~$ lshw 2>/dev/null | grep -B6 -A1 enp0s31f6 
        *-network
             description: Ethernet interface
             product: Ethernet Connection (7) I219-LM
             vendor: Intel Corporation
             physical id: 1f.6
             bus info: pci@0000:00:1f.6
             logical name: enp0s31f6
             version: 10
kcook@i9coffee:~$ sudo ethtool -T enp0s31f6 
Time stamping parameters for enp0s31f6:
Capabilities:
    hardware-transmit     (SOF_TIMESTAMPING_TX_HARDWARE)
    software-transmit     (SOF_TIMESTAMPING_TX_SOFTWARE)
    hardware-receive      (SOF_TIMESTAMPING_RX_HARDWARE)
    software-receive      (SOF_TIMESTAMPING_RX_SOFTWARE)
    software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
    hardware-raw-clock    (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 0
Hardware Transmit Timestamp Modes:
    off                   (HWTSTAMP_TX_OFF)
    on                    (HWTSTAMP_TX_ON)
Hardware Receive Filter Modes:
    none                  (HWTSTAMP_FILTER_NONE)
    all                   (HWTSTAMP_FILTER_ALL)
    ptpv1-l4-sync         (HWTSTAMP_FILTER_PTP_V1_L4_SYNC)
    ptpv1-l4-delay-req    (HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ)
    ptpv2-l4-sync         (HWTSTAMP_FILTER_PTP_V2_L4_SYNC)
    ptpv2-l4-delay-req    (HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ)
    ptpv2-l2-sync         (HWTSTAMP_FILTER_PTP_V2_L2_SYNC)
    ptpv2-l2-delay-req    (HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ)
    ptpv2-event           (HWTSTAMP_FILTER_PTP_V2_EVENT)
    ptpv2-sync            (HWTSTAMP_FILTER_PTP_V2_SYNC)
    ptpv2-delay-req       (HWTSTAMP_FILTER_PTP_V2_DELAY_REQ)

As suggested in the README in the gptp sample application, I have installed OpenAvnu/gPTP: https://docs.zephyrproject.org/latest/samples/net/gptp/README.html#setting-up-linux-host

As specified I have increased the neighborPropDelayThresh in the gptp_cfg.ini file, in this case to 1000000, to absorb any imprecise adjustment in the PHY delay on the Zephyr side. Here is the gptp_cfg.ini file we are using: gptp_cfg.ini.txt

Here is what we are seeing, with the SAME code, running in the SAME way, on all three boards

FRDM-K64F

Note how the Reset Timer bit in the ATCR register is 0, the nanoseconds field is getting filled, and the application is able to display nanoseconds:

FRDM64

MIMXRT1020-EVK

Note how the Reset Timer bit in the ATCR register is 1, the nanoseconds field is NOT getting filled, and the application is reading 0 nanoseconds:

RT1020

MIMXRT1050-EVKB

Note how the Reset Timer bit in the ATCR register is 1, the nanoseconds field is NOT getting filled, and the application is reading 0 nanoseconds:

RT1050

It is not going to be practical for me to send you my principal PC. If it would be VERY helpful and DRASTICALLY reduce the time it will take to resolve this issue, I can find another host device with hardware timestamping support (for OpenAvnu gptp), and send it to you.

Let's get this thing solved! Thanks everybody!

hakehuang commented 3 years ago

Hello @hakehuang and everyone listening in,

To anyone trying our code, please do a fresh checkout of the latest commit at:

https://github.com/ainguraXmarquiegui/zephyr/tree/feature/aingura_gptp_tests and https://github.com/ainguraXmarquiegui/hal_nxp/tree/feature/aingura_gptp_tests

Thanks to our friends at EBV, we got our hands on a FRDM-K64F and were able to run the same code on all three boards, and have confirmed that the FRDM works as expected, and the i.MXRT boards do not--and we are quite sure that it is related to the Reset Timer bit in the ATCR register.

First, as requested, this is the test setup, including are how the cables are hooked up:

FRDM-K64F, connected via USB to my PC (via the SDA USB port for the console UART), and with an Ethernet connection directly from the board to the Ethernet interface of my PC:

MIMXRT1020-EVK, connected via USB to my PC (via the J13 USB port for the console UART), and with an Ethernet connection directly from the board to the Ethernet interface of my PC:

MIMXRT1050-EVKB, connected via USB to my PC (via the J28 USB port for the console UART), and with an Ethernet connection directly from the board to the Ethernet interface of my PC:

My PC is nothing special. It does have an Intel I219-LM LOM Ethernet interface with hardware timestamping, which is very useful for gPTP, but most corporate PCs with NICs from Intel or Broadcom should have this capability:
kcook@i9coffee:~$ lshw 2>/dev/null | grep -B6 -A1 enp0s31f6 
        *-network
             description: Ethernet interface
             product: Ethernet Connection (7) I219-LM
             vendor: Intel Corporation
             physical id: 1f.6
             bus info: pci@0000:00:1f.6
             logical name: enp0s31f6
             version: 10
kcook@i9coffee:~$ sudo ethtool -T enp0s31f6 
Time stamping parameters for enp0s31f6:
Capabilities:
  hardware-transmit     (SOF_TIMESTAMPING_TX_HARDWARE)
  software-transmit     (SOF_TIMESTAMPING_TX_SOFTWARE)
  hardware-receive      (SOF_TIMESTAMPING_RX_HARDWARE)
  software-receive      (SOF_TIMESTAMPING_RX_SOFTWARE)
  software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
  hardware-raw-clock    (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 0
Hardware Transmit Timestamp Modes:
  off                   (HWTSTAMP_TX_OFF)
  on                    (HWTSTAMP_TX_ON)
Hardware Receive Filter Modes:
  none                  (HWTSTAMP_FILTER_NONE)
  all                   (HWTSTAMP_FILTER_ALL)
  ptpv1-l4-sync         (HWTSTAMP_FILTER_PTP_V1_L4_SYNC)
  ptpv1-l4-delay-req    (HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ)
  ptpv2-l4-sync         (HWTSTAMP_FILTER_PTP_V2_L4_SYNC)
  ptpv2-l4-delay-req    (HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ)
  ptpv2-l2-sync         (HWTSTAMP_FILTER_PTP_V2_L2_SYNC)
  ptpv2-l2-delay-req    (HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ)
  ptpv2-event           (HWTSTAMP_FILTER_PTP_V2_EVENT)
  ptpv2-sync            (HWTSTAMP_FILTER_PTP_V2_SYNC)
  ptpv2-delay-req       (HWTSTAMP_FILTER_PTP_V2_DELAY_REQ)
As suggested in the README in the gptp sample application, I have installed OpenAvnu/gPTP: https://docs.zephyrproject.org/latest/samples/net/gptp/README.html#setting-up-linux-host

As specified I have increased the neighborPropDelayThresh in the gptp_cfg.ini file, in this case to 1000000, to absorb any imprecise adjustment in the PHY delay on the Zephyr side. Here is the gptp_cfg.ini file we are using: gptp_cfg.ini.txt

Here is what we are seeing, with the SAME code, running in the SAME way, on all three boards

FRDM-K64F

Note how the Reset Timer bit in the ATCR register is 0, the nanoseconds field is getting filled, and the application is able to display nanoseconds:

MIMXRT1020-EVK

Note how the Reset Timer bit in the ATCR register is 1, the nanoseconds field is NOT getting filled, and the application is reading 0 nanoseconds:

MIMXRT1050-EVKB

Note how the Reset Timer bit in the ATCR register is 1, the nanoseconds field is NOT getting filled, and the application is reading 0 nanoseconds:

It is not going to be practical for me to send you my principal PC. If it would be VERY helpful and DRASTICALLY reduce the time it will take to resolve this issue, I can find another host device with hardware timestamping support (for OpenAvnu gptp), and send it to you.

Let's get this thing solved! Thanks everybody!

@kevin137 , can you try same? set a break point at ENET_Ptp1588GetTimer , and show me what you can see? you code in gptp/src/main.c has some problem, because it is to check the register after the init_testing, but if the init_testing quit, the whole test is end, see below code. what is the ret when you quit? at my side the for rt boards the status is the board is running in DISABLED. but for frdmk64f the board is running in master mode

void init_testing(void)
{
    uint32_t uptime = k_uptime_get_32();
    int ret;

    if (run_duration == 0) {
        return;
    }

    k_sem_init(&quit_lock, 0, K_SEM_MAX_LIMIT);

    k_work_init_delayable(&stop_sample, stop_handler);
    k_work_reschedule(&stop_sample, K_SECONDS(run_duration));

    k_sem_take(&quit_lock, K_FOREVER);

    LOG_INF("Stopping after %u seconds",
        (k_uptime_get_32() - uptime) / 1000);

    /* Try to figure out what is the sync state.
     * Return:
     *  <0 - configuration error
     *   0 - not time sync
     *   1 - we are MASTER
     *   2 - we are SLAVE
     */
    ret = get_current_status();

    exit(ret);
}

kevin137 commented 3 years ago

Hi again @hakehuang ,

I'm having a few logistical issues and am not able to get the debugger to run on the 1050 this evening. No problems on the K64F, but the interesting stuff is happening on the 1050.

We see what you are getting at with the init_testing comment, but we don't see the exit happen because in our case, run_duration is 0, so init_testing exits at the fifth line, "exit(ret);" never runs, and all of our code runs after that.

I will try again in the morning to get the debugger working and stop at the breakpoint you are requesting.

hakehuang commented 3 years ago

on board debug should work, you just use below command line

west debug --runner=pyocd

kevin137 commented 3 years ago

Hello again @hakehuang and friends,

Our local software expert @ainguraXmarquiegui has been helping out all along, but now he has stepped in directly to setup the breakpoints, take data, and also compile and run the equivalent code for FreeRTOS. First we would like to show that we are NOT exiting directly out of the init_testing() function. Using the MIMXRT1020-EVK, @ainguraXmarquiegui set some breakpoints both inside the function, and then after.

Here we are, inside init_testing(), run_duration has been optimized out by the compiler, the "if" is asserted, we execute the LOG_INF, and return out of the function, long before the exit(ret) at the end of init_testing(): photo_2021-04-30_17-31-01

Now, we are back in main(), and the program continues executing: photo_2021-04-30_17-35-39

Moving on to the requested breakpoint at ENET_Ptp1588GetTimer. Here it is on the MIMXRT1020-EVK: photo_2021-04-30_17-38-54 As you can see, the ATCRval register is reading 0xa91

@ainguraXmarquiegui has compiled the equivalent code, loaded it on the same board, and set the same breakpoint in FreeRTOS: photo_2021-04-30_17-45-39 And here, the ATCR register is reading 0x91, as we think it should be.

We hope that this will help narrow in on what is going on. Have a great weekend everybody! Thanks again.

ainguraXmarquiegui commented 3 years ago

I have been trying to compare code from FreeRTOs and Zephyr and I haven't been able to identify any major differences.

On the other hand, I have been able to narrow down a little bit more on the apparent root of the cause. Function ENET_Ptp1588StartTimer on

https://github.com/zephyrproject-rtos/hal_nxp/blob/e7b8d8ba5932f70eb5f978ad7f6f278cf0bf565c/mcux/drivers/imx/fsl_enet.c#L2801

sets the ENET_ATCR_RESTART_MASK. And although the reference manual states that it should go back to 0, it never does.

We previously observed this oddity in the function ENET_Ptp1588GetTimerNoIrqDisable, but seeing right in the initialization function without the need to have any custom code makes replicating the problem easier.

I have gone through the reference manual, but I'm unable to find any type of explanation why that bit is behaving like it should in FreeRTOs but not in Zephyr. It's like that part of the system is turned off or something.

Is there anything else we can do to help fixing this problem?

hakehuang commented 3 years ago

Hello again @hakehuang and friends,

Our local software expert @ainguraXmarquiegui has been helping out all along, but now he has stepped in directly to setup the breakpoints, take data, and also compile and run the equivalent code for FreeRTOS. First we would like to show that we are NOT exiting directly out of the init_testing() function. Using the MIMXRT1020-EVK, @ainguraXmarquiegui set some breakpoints both inside the function, and then after.

Here we are, inside init_testing(), run_duration has been optimized out by the compiler, the "if" is asserted, we execute the LOG_INF, and return out of the function, long before the exit(ret) at the end of init_testing():

Now, we are back in main(), and the program continues executing:

Moving on to the requested breakpoint at ENET_Ptp1588GetTimer. Here it is on the MIMXRT1020-EVK: As you can see, the ATCRval register is reading 0xa91

@ainguraXmarquiegui has compiled the equivalent code, loaded it on the same board, and set the same breakpoint in FreeRTOS: And here, the ATCR register is reading 0x91, as we think it should be.

We hope that this will help narrow in on what is going on. Have a great weekend everybody! Thanks again.

Hi @kevin137 so do you get a chance to check with debugger as I do? I see you steps above, and you still checking the control register.

ainguraXmarquiegui commented 3 years ago

Hello @hakehuang . Yes, we posted the registers because we think that information is more valuable and closer to the root of the problem. We might be wrong, though...

Seeing that you are still interested in us replicating your steps with the debugger, we have done so in two of our EVKs.

Here's what we see when we run the debug session on the i.MX-rt1020-EVK:

(gdb) break /home/xmarquiegui/zephyrproject/modules/hal/nxp/mcux/drivers/imx/fsl_enet.c:ENET_Ptp1588GetTimer
Breakpoint 1 at 0x60022174: file /home/xmarquiegui/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h, line 453.
(gdb) continue
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.

Breakpoint 1, ENET_Ptp1588GetTimer (base=0x402d8000, handle=handle@entry=0x80000090 <eth0_context+16>, ptpTime=ptpTime@entry=0x8000ac60 <z_interrupt_stacks+1920>) at /home/xmarquiegui/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:453
453       __ASM volatile ("MRS %0, primask" : "=r" (result) );
(gdb) n
2857        ENET_Ptp1588GetTimerNoIrqDisable(base, handle, ptpTime);
(gdb) 
2860        if (0U != (base->EIR & (uint32_t)kENET_TsTimerInterrupt))
(gdb) 
2866        EnableGlobalIRQ(primask);
(gdb) 
eth_rx (context=0x80000080 <eth0_context>) at /home/xmarquiegui/zephyrproject/zephyr/drivers/ethernet/eth_mcux.c:799
799                     if (ptpTimeData.nanosecond < ts) {
(gdb) print ts
$1 = 2863311530
(gdb) print ptpTimeData.nanosecond
$2 = 2863311530
(gdb) n
803                     pkt->timestamp.nanosecond = ptpTimeData.nanosecond;
(gdb) print ptpTimeData.nanosecond
$3 = 2863311530
(gdb) n
804                     pkt->timestamp.second = ptpTimeData.second;
(gdb) p pkt->timestamp.nanosecond
$4 = 0
(gdb) p
$5 = 0
(gdb)

Here's what we see when we run the debug session on the frdm_k64f EVK:

(gdb) break /home/xmarquiegui/zephyrproject/modules/hal/nxp/mcux/drivers/kinetis/fsl_enet.c:ENET_Ptp1588GetTimer
Breakpoint 1 at 0x1fb2c: file /home/xmarquiegui/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h, line 453.
(gdb) continue
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.

Breakpoint 1, ENET_Ptp1588GetTimer (base=0x400c0000, handle=handle@entry=0x20000010 <eth0_context+16>, ptpTime=ptpTime@entry=0x2000acb0 <z_interrupt_stacks+1872>) at /home/xmarquiegui/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:453
453       __ASM volatile ("MRS %0, primask" : "=r" (result) );
(gdb) n
2857        ENET_Ptp1588GetTimerNoIrqDisable(base, handle, ptpTime);
(gdb) 
2860        if (0U != (base->EIR & (uint32_t)kENET_TsTimerInterrupt))
(gdb) 
2862            ptpTime->second++;
(gdb) 
2866        EnableGlobalIRQ(primask);
(gdb) 
eth_rx (context=0x20000000 <eth0_context>) at /home/xmarquiegui/zephyrproject/zephyr/drivers/ethernet/eth_mcux.c:799
799                     if (ptpTimeData.nanosecond < ts) {
(gdb) print ts
$1 = 724525460
(gdb) print ptpTimeData.nanosecond
$2 = 668545480
(gdb) n
800                             ptpTimeData.second--;
(gdb) print ptpTimeData.nanosecond
$3 = 668545480
(gdb) n
803                     pkt->timestamp.nanosecond = ptpTimeData.nanosecond;
(gdb) print ptpTimeData.nanosecond
$4 = 668545480
(gdb) n
804                     pkt->timestamp.second = ptpTimeData.second;
(gdb) p pkt->timestamp.nanosecond
$5 = 668545480
(gdb) p
$6 = 668545480

Any thoughts? Would you be interested in doing a remote session so that you can interact with our board from your remote location?

hakehuang commented 3 years ago

Hello @hakehuang . Yes, we posted the registers because we think that information is more valuable and closer to the root of the problem. We might be wrong, though...

Seeing that you are still interested in us replicating your steps with the debugger, we have done so in two of our EVKs.

Here's what we see when we run the debug session on the i.MX-rt1020-EVK:

(gdb) break /home/xmarquiegui/zephyrproject/modules/hal/nxp/mcux/drivers/imx/fsl_enet.c:ENET_Ptp1588GetTimer
Breakpoint 1 at 0x60022174: file /home/xmarquiegui/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h, line 453.
(gdb) continue
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.

Breakpoint 1, ENET_Ptp1588GetTimer (base=0x402d8000, handle=handle@entry=0x80000090 <eth0_context+16>, ptpTime=ptpTime@entry=0x8000ac60 <z_interrupt_stacks+1920>) at /home/xmarquiegui/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:453
453       __ASM volatile ("MRS %0, primask" : "=r" (result) );
(gdb) n
2857        ENET_Ptp1588GetTimerNoIrqDisable(base, handle, ptpTime);
(gdb) 
2860        if (0U != (base->EIR & (uint32_t)kENET_TsTimerInterrupt))
(gdb) 
2866        EnableGlobalIRQ(primask);
(gdb) 
eth_rx (context=0x80000080 <eth0_context>) at /home/xmarquiegui/zephyrproject/zephyr/drivers/ethernet/eth_mcux.c:799
799                     if (ptpTimeData.nanosecond < ts) {
(gdb) print ts
$1 = 2863311530
(gdb) print ptpTimeData.nanosecond
$2 = 2863311530
(gdb) n
803                     pkt->timestamp.nanosecond = ptpTimeData.nanosecond;
(gdb) print ptpTimeData.nanosecond
$3 = 2863311530
(gdb) n
804                     pkt->timestamp.second = ptpTimeData.second;
(gdb) p pkt->timestamp.nanosecond
$4 = 0
(gdb) p
$5 = 0
(gdb)

Here's what we see when we run the debug session on the frdm_k64f EVK:

(gdb) break /home/xmarquiegui/zephyrproject/modules/hal/nxp/mcux/drivers/kinetis/fsl_enet.c:ENET_Ptp1588GetTimer
Breakpoint 1 at 0x1fb2c: file /home/xmarquiegui/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h, line 453.
(gdb) continue
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.

Breakpoint 1, ENET_Ptp1588GetTimer (base=0x400c0000, handle=handle@entry=0x20000010 <eth0_context+16>, ptpTime=ptpTime@entry=0x2000acb0 <z_interrupt_stacks+1872>) at /home/xmarquiegui/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:453
453       __ASM volatile ("MRS %0, primask" : "=r" (result) );
(gdb) n
2857        ENET_Ptp1588GetTimerNoIrqDisable(base, handle, ptpTime);
(gdb) 
2860        if (0U != (base->EIR & (uint32_t)kENET_TsTimerInterrupt))
(gdb) 
2862            ptpTime->second++;
(gdb) 
2866        EnableGlobalIRQ(primask);
(gdb) 
eth_rx (context=0x20000000 <eth0_context>) at /home/xmarquiegui/zephyrproject/zephyr/drivers/ethernet/eth_mcux.c:799
799                     if (ptpTimeData.nanosecond < ts) {
(gdb) print ts
$1 = 724525460
(gdb) print ptpTimeData.nanosecond
$2 = 668545480
(gdb) n
800                             ptpTimeData.second--;
(gdb) print ptpTimeData.nanosecond
$3 = 668545480
(gdb) n
803                     pkt->timestamp.nanosecond = ptpTimeData.nanosecond;
(gdb) print ptpTimeData.nanosecond
$4 = 668545480
(gdb) n
804                     pkt->timestamp.second = ptpTimeData.second;
(gdb) p pkt->timestamp.nanosecond
$5 = 668545480
(gdb) p
$6 = 668545480

Any thoughts? Would you be interested in doing a remote session so that you can interact with our board from your remote location?

gptp

Now it looks like something strange, this is only one line assigment what I can see is that ptpTimeData.nanosecond is updated with correct value, but pkt->timestamp.nanosecond is not assigned after 803 pkt->timestamp.nanosecond = ptpTimeData.nanosecond;

it is not possible, if you use a printf function here, you will see the value shall be assigned.

so my observation is that, for RT series the ptp timestamp is enable, but for some reason RT board goes to stop mode, whereas the FRDM board goes to master mode, according to the gptp protocol, master mode instance will keep sending SYNC packet, that's why you see the timestamp during your debugging. and I believe the root cause is the same as what I have reported before, there is some state machine issue in the gptp stack. I am not an expect for this, quick fix may not be a good solution, we need ask gptp owner for better solution. anyway it is not low layer driver issue, as you can see the timestamp generated.

kevin137 commented 3 years ago

Ok @hakehuang , we agree that line 803 is very strange. It certainly seems that, after the assignment, pkt->timestamp.nanosecond should get the value in ptpTimeData.nanosecond. We are quite convinced that the problem is related to the ATCR Reset Timer bit not clearing, as we attempted to show in previous posts, but in any case, we agree that there is some state machine issue.

@MaureenHelm , How can we keep this moving forward?

reibax-marcus commented 3 years ago

Hello @hakehuang ,

Thank you very much for your time. Allow me to disagree on the fact that it is not a low layer driver issue. I think we are running into more than problem here. I did exactly the steps you asked to get this unstuck, but I have more observations.

In both your test run, and our test run the debugger is misleading us. ptpTimeData.nanosecond in not being updated with the correct value, and that is why pkt->timestamp.nanosecond in 0 after that step.

Allow me to demonstrate why I think this is happening.

See this comment that you posted: https://github.com/zephyrproject-rtos/zephyr/issues/33747#issuecomment-822556488

Notice the value of ptpTimeData.nanosecond in your test run:

(gdb) print ptpTimeData.nanosecond
$7 = 2863311530

I can see how at first glance that looks like a correct value, but let's convert that to hex:

2863311530 = 0xAAAAAAAA

Strange, isn't it?

Now, let's move forward to my test run at: https://github.com/zephyrproject-rtos/zephyr/issues/33747#issuecomment-833519663

Notice the value of ptpTimeData.nanosecond in my test run:

(gdb) print ptpTimeData.nanosecond
$2 = 2863311530

It's exactly the same. Strange, isn't it? How likely is it that you and me run into the exact same nanosecond? If you keep running the debugger you will see that that value doesn't change, which means that it's not a correct nanosecond value.

Now, let's verify if the debugger is working properly or not.

If we patch https://github.com/zephyrproject-rtos/zephyr/tree/d93297b3d078d2191935acf4c53ba302c08fd427 like this:

 diff --git a/drivers/ethernet/eth_mcux.c b/drivers/ethernet/eth_mcux.c
index db21aa39b6..f71effbfcd 100644
--- a/drivers/ethernet/eth_mcux.c
+++ b/drivers/ethernet/eth_mcux.c
@@ -30,6 +30,7 @@ LOG_MODULE_REGISTER(LOG_MODULE_NAME);
 #include <net/net_if.h>
 #include <net/ethernet.h>
 #include <ethernet/eth_stats.h>
+#include <sys/printk.h>

 #if defined(CONFIG_PTP_CLOCK_MCUX)
 #include <drivers/ptp_clock.h>
@@ -809,6 +810,9 @@ static void eth_rx(struct eth_context *context)
        }
 #endif /* CONFIG_PTP_CLOCK_MCUX */

+       printk("XXX - DBG - ptpTimedata.nanosecond: 0x%X\n", ptpTimeData.nanosecond);
+       printk("XXX - DBG - pkt->timestamp.nanosecond: 0x%X\n", pkt->timestamp.nanosecond);
+
        irq_unlock(imask);

        iface = get_iface(context, vlan_tag);

And run the same test, here's what we see:

Debug session:

(gdb) break /home/xmarquiegui/zephyrproject/modules/hal/nxp/mcux/drivers/imx/fsl_enet.c:ENET_Ptp1588GetTimer
Breakpoint 1 at 0x6002218c: file /home/xmarquiegui/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h, line 453.
(gdb) continue
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.

Breakpoint 1, ENET_Ptp1588GetTimer (base=0x402d8000, handle=handle@entry=0x80000090 <eth0_context+16>, 
    ptpTime=ptpTime@entry=0x8000ac60 <z_interrupt_stacks+1920>)
    at /home/xmarquiegui/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:453
453       __ASM volatile ("MRS %0, primask" : "=r" (result) );
(gdb) next
2857        ENET_Ptp1588GetTimerNoIrqDisable(base, handle, ptpTime);
(gdb) 
2860        if (0U != (base->EIR & (uint32_t)kENET_TsTimerInterrupt))
(gdb) 
2866        EnableGlobalIRQ(primask);
(gdb) 
eth_rx (context=0x80000080 <eth0_context>) at /home/xmarquiegui/zephyrproject/zephyr/drivers/ethernet/eth_mcux.c:800
800                     if (ptpTimeData.nanosecond < ts) {
(gdb) display /x ptpTimeData 
1: /x ptpTimeData = {second = 0xaaaaaaaaaaaaaaaa, nanosecond = 0xaaaaaaaa}
(gdb) display /x pkt->timestamp
2: /x pkt->timestamp = {{_sec = {low = 0x0, high = 0x0, unused = 0x0}, second = 0x0}, nanosecond = 0x0}
(gdb) next
804                     pkt->timestamp.nanosecond = ptpTimeData.nanosecond;
1: /x ptpTimeData = {second = 0xaaaaaaaaaaaaaaaa, nanosecond = 0xaaaaaaaa}
2: /x pkt->timestamp = {{_sec = {low = 0x0, high = 0x0, unused = 0x0}, second = 0x0}, nanosecond = 0x0}
(gdb) 
805                     pkt->timestamp.second = ptpTimeData.second;
1: /x ptpTimeData = {second = 0xaaaaaaaaaaaaaaaa, nanosecond = 0xaaaaaaaa}
2: /x pkt->timestamp = {{_sec = {low = 0x0, high = 0x0, unused = 0x0}, second = 0x0}, nanosecond = 0x0}
(gdb) 
813             printk("XXX - DBG - ptpTimedata.nanosecond: 0x%X\n", ptpTimeData.nanosecond);
1: /x ptpTimeData = {second = 0xaaaaaaaaaaaaaaaa, nanosecond = 0xaaaaaaaa}
2: /x pkt->timestamp = {{_sec = {low = 0x0, high = 0x0, unused = 0x0}, second = 0x0}, nanosecond = 0x0}
(gdb) 
814             printk("XXX - DBG - pkt->timestamp.nanosecond: 0x%X\n", pkt->timestamp.nanosecond);
1: /x ptpTimeData = {second = 0xaaaaaaaaaaaaaaaa, nanosecond = 0xaaaaaaaa}
2: /x pkt->timestamp = {{_sec = {low = 0x0, high = 0x0, unused = 0x0}, second = 0x0}, nanosecond = 0x0}
(gdb) 
816             irq_unlock(imask);
1: /x ptpTimeData = {second = 0xaaaaaaaaaaaaaaaa, nanosecond = 0xaaaaaaaa}
2: /x pkt->timestamp = {{_sec = {low = 0x0, high = 0x0, unused = 0x0}, second = 0x0}, nanosecond = 0x0}

Console:

*** Booting Zephyr OS build zephyr-v2.5.0-2550-gd93297b3d078  ***                                                                      

[00:00:00.323,000] <wrn> net_if: iface 0x80001008 is down                                                                              
[00:00:00.326,000] <inf> net_config: Initializing network                                                                              
[00:00:00.326,000] <inf> net_config: Waiting interface 1 (0x80001008) to be up...                                                      
[00:00:01.325,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:01.327,000] <wrn> net_if: iface 0x80001008 is down                                                                              
[00:00:02.328,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:02.331,000] <wrn> net_if: iface 0x80001008 is down                                                                              
[00:00:03.333,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:03.336,000] <wrn> net_if: iface 0x80001008 is down                                                                              
uart:~$ XXX - DBG - ptpTimedata.nanosecond: 0x0                                                                                        
XXX - DBG - pkt->timestamp.nanosecond: 0x0

TLDR: Even if the debugger is showing us that 0xAAAAAAAA (2863311530) value for ptpTimeData.nanosecond, if we print the value the system actually has 0 on that variable. We can't trust the debugger.

jukkar commented 3 years ago

I can see how at first glance that looks like a correct value, but let's convert that to hex:
2863311530 = 0xAAAAAAAA

This looks like some kind of stack issue

hakehuang commented 3 years ago

I can see how at first glance that looks like a correct value, but let's convert that to hex:
2863311530 = 0xAAAAAAAA
This looks like some kind of stack issue

I enable the CONFIG_NO_OPTIMIZATIONS=y and get below error:

[00:00:02.235,000] <wrn> net_gptp: Reset Pdelay requests
[00:00:02.239,000] <wrn> net_if: iface 0x80001160 is down
[00:00:03.013,000] <err> os: ***** MPU FAULT *****
[00:00:03.013,000] <err> os:   Stacking error (context area might be not valid)
[00:00:03.013,000] <err> os: r0/a1:  0xaaaaaaaa  r1/a2:  0xaaaaaaaa  r2/a3:  0x00000000
[00:00:03.013,000] <err> os: r3/a4:  0x00000007 r12/ip:  0xe000ed00 r14/lr:  0x00000000
[00:00:03.013,000] <err> os:  xpsr:  0x6006ca00
[00:00:03.013,000] <err> os: Faulting instruction address (r15/pc): 0x00000000
[00:00:03.013,000] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
[00:00:03.013,000] <err> os: Current thread: 0x80003938 (sysworkq)
[00:00:03.063,000] <err> os: Halting system

ainguraXmarquiegui commented 3 years ago

Things I have tried to work around these issues:

Compile the zephyr version we have been working with against an older SDK with a previous version of gcc: SDK version 0.11.4
Upgrade nxp hal components with the latest version available in mcuxpresso freertos source code (available in feature/mcuxpresso_backport branch of my hal work repo)
Play around with all _STACK_SIZE config options that I have been able to locate in the .config file
Compare zephyr/soc/arm/nxp_imx/rt/soc.c file with other analogous files

So far I haven't been able to make any progress. Can we get someone else involved that could give this a push? This is the kind of error that is a deal-breaker for any user interested in zephyr for this platform.

hakehuang commented 3 years ago

Things I have tried to work around these issues:

Compile the zephyr version we have been working with against an older SDK with a previous version of gcc: SDK version 0.11.4

Upgrade nxp hal components with the latest version available in mcuxpresso freertos source code (available in feature/mcuxpresso_backport branch of my hal work repo)

Play around with all _STACK_SIZE config options that I have been able to locate in the .config file

Compare zephyr/soc/arm/nxp_imx/rt/soc.c file with other analogous files

So far I haven't been able to make any progress. Can we get someone else involved that could give this a push? This is the kind of error that is a deal-breaker for any user interested in zephyr for this platform.

This is my config seting for rt boards, still in debug, I have rise this issue on Zephyr testing meeting.

# MCUX driver settings
CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y
CONFIG_NO_OPTIMIZATIONS=y
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_MAIN_STACK_SIZE=409600

this may be something wrong in the enet_mcux.c driver, I am checking with desgin team on possible issue. I try below code to debug this, it keeps in while loop.

diff --git a/mcux/drivers/imx/fsl_enet.c b/mcux/drivers/imx/fsl_enet.c
index 2e9d229..d17ca3f 100644
--- a/mcux/drivers/imx/fsl_enet.c
+++ b/mcux/drivers/imx/fsl_enet.c
@@ -2800,14 +2800,21 @@ void ENET_Ptp1588Configure(ENET_Type *base, enet_handle_t *handle, enet_ptp_conf
  */
 void ENET_Ptp1588StartTimer(ENET_Type *base, uint32_t ptpClkSrc)
 {
+    printk(" base->ATCR = %x\n",  base->ATCR);
     /* Restart PTP 1588 timer, master clock. */
-    base->ATCR = ENET_ATCR_RESTART_MASK;
+    base->ATCR |= ENET_ATCR_RESTART_MASK;
+
+    while ((base->ATCR & ENET_ATCR_RESTART_MASK) == ENET_ATCR_RESTART_MASK) {
+         ;
+    }

     /* Initializes PTP 1588 timer. */
     base->ATINC = ENET_ATINC_INC(ENET_NANOSECOND_ONE_SECOND / ptpClkSrc);
     base->ATPER = ENET_NANOSECOND_ONE_SECOND;
     /* Sets periodical event and the event signal output assertion and Actives PTP 1588 timer.  */
     base->ATCR = ENET_ATCR_PEREN_MASK | ENET_ATCR_PINPER_MASK | ENET_ATCR_EN_MASK;
+    printk(" base->ATCR = %x\n",  base->ATCR);
+    printk("Start Timer\n");
 }

the console output is like below

base->ATCR = 0

according to the RM this shall be This field automatically clears to 0 after the command completes

ainguraXmarquiegui commented 3 years ago

Exactly. That's what we are seeing too. We are on the same track now. I hope the design team can shed some new light on the issue. Thank you!

hakehuang commented 3 years ago

@ainguraXmarquiegui , thanks @MaureenHelm help. with below patch, now the ACTR work fine. can you give a try at your side? Thanks in advance

ubuntu@ubuntu-OptiPlex-7050:/home/shared/disk/zephyr_project/gptp/zephyr/samples/net/gptp$ git diff ../../../soc/arm/nxp_imx/rt/soc.c
diff --git a/soc/arm/nxp_imx/rt/soc.c b/soc/arm/nxp_imx/rt/soc.c
index d818c7a8fd..a9032dc4f2 100644
--- a/soc/arm/nxp_imx/rt/soc.c
+++ b/soc/arm/nxp_imx/rt/soc.c
@@ -143,6 +143,22 @@ static ALWAYS_INLINE void clock_init(void)
        CLOCK_InitUsb1Pll(&usb1PllConfig); /* Configure USB1 PLL to 480M */
 #endif
 #ifdef CONFIG_INIT_ENET_PLL
+       /* DeInit Enet PLL. */
+       CLOCK_DeinitEnetPll();
+       /* Bypass Enet PLL. */
+       CLOCK_SetPllBypass(CCM_ANALOG, kCLOCK_PllEnet, 1);
+       /* Set Enet output divider. */
+       CCM_ANALOG->PLL_ENET =
+            (CCM_ANALOG->PLL_ENET & (~CCM_ANALOG_PLL_ENET_DIV_SELECT_MASK)) | CCM_ANALOG_PLL_ENET_DIV_SELECT(1);
+       /* Enable Enet output. */
+       CCM_ANALOG->PLL_ENET |= CCM_ANALOG_PLL_ENET_ENABLE_MASK;
+       /* Set Enet2 output divider. */
+       CCM_ANALOG->PLL_ENET =
+           (CCM_ANALOG->PLL_ENET & (~CCM_ANALOG_PLL_ENET_ENET2_DIV_SELECT_MASK)) | CCM_ANALOG_PLL_ENET_ENET2_DIV_SELECT(0);
+       /* Enable Enet2 output. */
+       CCM_ANALOG->PLL_ENET |= CCM_ANALOG_PLL_ENET_ENET2_REF_EN_MASK;
+       /* Enable Enet25M output. */
+       CCM_ANALOG->PLL_ENET |= CCM_ANALOG_PLL_ENET_ENET_25M_REF_EN_MASK;
        CLOCK_InitEnetPll(&ethPllConfig);
 #endif
 #ifdef CONFIG_INIT_VIDEO_PLL

ainguraXmarquiegui commented 3 years ago

Thank you @MaureenHelm and @hakehuang for your support. I think we are now getting closer to a solution.

Applying your suggested change makes the compilation fail, so I have had to make a few additional changes:

zephyr changes:

diff --git a/soc/arm/nxp_imx/rt/soc.c b/soc/arm/nxp_imx/rt/soc.c
index d818c7a8fd..5991aea791 100644
--- a/soc/arm/nxp_imx/rt/soc.c
+++ b/soc/arm/nxp_imx/rt/soc.c
@@ -103,6 +103,30 @@ const __imx_boot_ivt_section ivt image_vector_table = {
 };
 #endif

+/*! ENET2_DIV_SELECT
+ *  0b00..25MHz
+ *  0b01..50MHz
+ *  0b10..100MHz (not 50% duty cycle)
+ *  0b11..125MHz
+ */
+#define CCM_ANALOG_PLL_ENET_SET_ENET2_DIV_SELECT(x) (((uint32_t)(((uint32_t)(x)) << CCM_ANALOG_PLL_ENET_SET_ENET2_DIV_SELECT_SHIFT)) & CCM_ANALOG_PLL_ENET_SET_ENET2_DIV_SELECT_MASK)
+/*! ENET2_DIV_SELECT
+ *  0b00..25MHz
+ *  0b01..50MHz
+ *  0b10..100MHz (not 50% duty cycle)
+ *  0b11..125MHz
+ */
+#define CCM_ANALOG_PLL_ENET_ENET2_DIV_SELECT(x)  (((uint32_t)(((uint32_t)(x)) << CCM_ANALOG_PLL_ENET_ENET2_DIV_SELECT_SHIFT)) & CCM_ANALOG_PLL_ENET_ENET2_DIV_SELECT_MASK)
+
+
+#define CCM_ANALOG_PLL_ENET_ENET2_DIV_SELECT_MASK (0xCU)
+#define CCM_ANALOG_PLL_ENET_ENET2_DIV_SELECT_SHIFT (2U)
+#define CCM_ANALOG_PLL_ENET_ENET2_REF_EN_MASK    (0x100000U)
+#define CCM_ANALOG_PLL_ENET_ENET2_REF_EN_SHIFT   (20U)
+#define CCM_ANALOG_PLL_ENET_ENET2_REF_EN(x)      (((uint32_t)(((uint32_t)(x)) << CCM_ANALOG_PLL_ENET_ENET2_REF_EN_SHIFT)) & CCM_ANALOG_PLL_ENET_ENET2_REF_EN_MASK)
+
+
+
 /**
  *
  * @brief Initialize the system clock
@@ -143,6 +167,22 @@ static ALWAYS_INLINE void clock_init(void)
        CLOCK_InitUsb1Pll(&usb1PllConfig); /* Configure USB1 PLL to 480M */
 #endif
 #ifdef CONFIG_INIT_ENET_PLL
+       /* DeInit Enet PLL. */
+       CLOCK_DeinitEnetPll();
+       /* Bypass Enet PLL. */
+       CLOCK_SetPllBypass(CCM_ANALOG, kCLOCK_PllEnet, 1);
+       /* Set Enet output divider. */
+       CCM_ANALOG->PLL_ENET =
+            (CCM_ANALOG->PLL_ENET & (~CCM_ANALOG_PLL_ENET_DIV_SELECT_MASK)) | CCM_ANALOG_PLL_ENET_DIV_SELECT(1);
+       /* Enable Enet output. */
+       CCM_ANALOG->PLL_ENET |= CCM_ANALOG_PLL_ENET_ENABLE_MASK;
+       /* Set Enet2 output divider. */
+       CCM_ANALOG->PLL_ENET =
+           (CCM_ANALOG->PLL_ENET & (~CCM_ANALOG_PLL_ENET_ENET2_DIV_SELECT_MASK)) | CCM_ANALOG_PLL_ENET_ENET2_DIV_SELECT(0);
+       /* Enable Enet2 output. */
+       CCM_ANALOG->PLL_ENET |= CCM_ANALOG_PLL_ENET_ENET2_REF_EN_MASK;
+       /* Enable Enet25M output. */
+       CCM_ANALOG->PLL_ENET |= CCM_ANALOG_PLL_ENET_ENET_25M_REF_EN_MASK;
        CLOCK_InitEnetPll(&ethPllConfig);
 #endif
 #ifdef CONFIG_INIT_VIDEO_PLL

nxp hal module changes:

diff --git a/mcux/drivers/imx/fsl_enet.c b/mcux/drivers/imx/fsl_enet.c
index 2e9d229..3d93806 100644
--- a/mcux/drivers/imx/fsl_enet.c
+++ b/mcux/drivers/imx/fsl_enet.c
@@ -2800,14 +2800,21 @@ void ENET_Ptp1588Configure(ENET_Type *base, enet_handle_t *handle, enet_ptp_conf
  */
 void ENET_Ptp1588StartTimer(ENET_Type *base, uint32_t ptpClkSrc)
 {
+    printk(" base->ATCR = %x\n",  base->ATCR);
     /* Restart PTP 1588 timer, master clock. */
     base->ATCR = ENET_ATCR_RESTART_MASK;

+    while ((base->ATCR & ENET_ATCR_RESTART_MASK) == ENET_ATCR_RESTART_MASK) {
+         ;
+    }
+
     /* Initializes PTP 1588 timer. */
     base->ATINC = ENET_ATINC_INC(ENET_NANOSECOND_ONE_SECOND / ptpClkSrc);
     base->ATPER = ENET_NANOSECOND_ONE_SECOND;
     /* Sets periodical event and the event signal output assertion and Actives PTP 1588 timer.  */
     base->ATCR = ENET_ATCR_PEREN_MASK | ENET_ATCR_PINPER_MASK | ENET_ATCR_EN_MASK;
+    printk(" base->ATCR = %x\n",  base->ATCR);
+    printk("Start Timer\n");
 }

 /*!

With these changes I can now build and flash the application. The good news is ATCR register now seems to be showing the expected behavior. Bad news is some additional errors have popped up. See the console output:

 base->ATCR = 0                                                                      
 base->ATCR = 91                                                                     
Start Timer                                                                          
*** Booting Zephyr OS build v2.6.0-rc1-118-g05d8c6fc78b2  ***                        

[00:00:00.323,000] <wrn> net_if: iface 0x80001008 is down                            
[00:00:00.326,000] <inf> net_config: Initializing network                            
[00:00:00.326,000] <inf> net_config: Waiting interface 1 (0x80001008) to be up...    
[00:00:01.324,000] <wrn> net_gptp: Reset Pdelay requests                             
[00:00:01.327,000] <wrn> net_if: iface 0x80001008 is down                            
[00:00:02.328,000] <wrn> net_gptp: Reset Pdelay requests                             
[00:00:02.331,000] <wrn> net_if: iface 0x80001008 is down                            
[00:00:03.332,000] <wrn> net_gptp: Reset Pdelay requests                             
[00:00:03.335,000] <wrn> net_if: iface 0x80001008 is down                            
[00:00:04.336,000] <wrn> net_gptp: Reset Pdelay requests                             
[00:00:04.339,000] <wrn> net_if: iface 0x80001008 is down                            
[00:00:05.017,000] <inf> net_config: Interface 1 (0x80001008) coming up              
[00:00:05.017,000] <inf> eth_mcux: ETH_0 enabled 100M full-duplex mode.              
[00:00:05.017,000] <inf> net_config: IPv4 address: 192.0.2.1                         
[00:00:05.119,000] <inf> net_config: IPv6 address: 2001:db8::1                       
[00:00:05.341,000] <wrn> net_gptp: Reset Pdelay requests                             
[00:00:08.350,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns         
[00:00:10.354,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns         
[00:00:12.357,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns         
[00:00:15.363,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns         
[00:00:17.366,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns         
[00:00:20.371,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns         
[00:00:23.377,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns         
[00:00:26.381,000] <err> eth_mcux: ENET_SendFrame error: 4004                        
[00:00:27.135,000] <err> eth_mcux: ENET_SendFrame error: 4004                        
[00:00:27.382,000] <wrn> net_gptp: Reset Pdelay requests                             
[00:00:27.385,000] <err> eth_mcux: ENET_SendFrame error: 4004                        
[00:00:28.144,000] <wrn> net_gptp: Multiple pdelay requests                          
[00:00:28.144,000] <err> eth_mcux: ENET_SendFrame error: 4004

hakehuang commented 3 years ago

@ainguraXmarquiegui I see the same problem, and this looks like the PC is not a good gptp server

NET_WARN("Not AS capable: %u ns > %u ns",
             (uint32_t)port_ds->neighbor_prop_delay,
             (uint32_t)port_ds->neighbor_prop_delay_thresh);

we expect a delay less than 100000 ns but it is 4294967295(0xFFFFFFFF) ns.

and then the 4004 issues is about there are too many frames send request.

kStatus_ENET_TxFrameBusy    = MAKE_STATUS(kStatusGroup_ENET, 4U), /*!< Tx buffer descriptors are under process. */

ainguraXmarquiegui commented 3 years ago

I have just tested your commit ( https://github.com/zephyrproject-rtos/zephyr/pull/35328 ) on my imxrt1020evk and on my FRDM-K64F.

On the FRDM-K64F board everything seems to work as it always did, but using the same gptp server on the imxrt1020evk I'm sad to report that I'm still getting the same errors I reported last time with the test code:

*** Booting Zephyr OS build v2.6.0-rc1-119-g8f05fbfeb2a5  ***                                                                          

[00:00:00.323,000] <wrn> net_if: iface 0x80001008 is down                                                                              
[00:00:00.326,000] <inf> net_config: Initializing network                                                                              
[00:00:00.326,000] <inf> net_config: Waiting interface 1 (0x80001008) to be up...                                                      
[00:00:01.325,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:01.327,000] <wrn> net_if: iface 0x80001008 is down                                                                              
[00:00:02.329,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:02.331,000] <wrn> net_if: iface 0x80001008 is down                                                                              
[00:00:03.016,000] <inf> net_config: Interface 1 (0x80001008) coming up                                                                
[00:00:03.017,000] <inf> eth_mcux: ETH_0 enabled 100M full-duplex mode.                                                                
[00:00:03.017,000] <inf> net_config: IPv4 address: 192.0.2.1                                                                           
[00:00:03.118,000] <inf> net_config: IPv6 address: 2001:db8::1                                                                         
[00:00:03.333,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:04.339,000] <wrn> net_gptp: Not AS capable: 345931786 ns > 100000 ns                                                            
[00:00:05.341,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:06.342,000] <wrn> net_gptp: Not AS capable: 346392526 ns > 100000 ns                                                            
[00:00:07.344,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:08.345,000] <wrn> net_gptp: Not AS capable: 346336434 ns > 100000 ns                                                            
[00:00:09.347,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:10.349,000] <wrn> net_gptp: Not AS capable: 345663571 ns > 100000 ns                                                            
[00:00:11.350,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:12.351,000] <wrn> net_gptp: Not AS capable: 346172229 ns > 100000 ns                                                            
[00:00:13.352,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:14.354,000] <wrn> net_gptp: Not AS capable: 346178201 ns > 100000 ns                                                            
[00:00:15.357,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:16.359,000] <wrn> net_gptp: Not AS capable: 346286604 ns > 100000 ns                                                            
[00:00:17.360,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:18.362,000] <wrn> net_gptp: Not AS capable: 346406719 ns > 100000 ns                                                            
[00:00:19.364,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:20.366,000] <wrn> net_gptp: Not AS capable: 345856974 ns > 100000 ns                                                            
[00:00:21.366,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:21.756,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:22.368,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:22.370,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:22.759,000] <wrn> net_gptp: Multiple pdelay requests                                                                            
[00:00:22.760,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:23.372,000] <wrn> net_gptp: Reset Pdelay requests

What can I do to help fix this?

kevin137 commented 3 years ago

Hi everyone. I just wanted to clarify that with the OpenAvnu/gPTP daemon configured as outlined in https://docs.zephyrproject.org/latest/samples/net/gptp/README.html#setting-up-linux-host, and incorporating the code in #35328, we still get the same overall behavior as always. Synchronization works correctly on the FRDM-K64F, but the same code compiled for the IMXRT020-EVK does NOT. We are definitely getting closer, ATCR is now doing what it should, but we are not there yet.

hakehuang commented 3 years ago

I have just tested your commit ( #35328 ) on my imxrt1020evk and on my FRDM-K64F.

*** Booting Zephyr OS build v2.6.0-rc1-119-g8f05fbfeb2a5  ***                                                                          

[00:00:00.323,000] <wrn> net_if: iface 0x80001008 is down                                                                              
[00:00:00.326,000] <inf> net_config: Initializing network                                                                              
[00:00:00.326,000] <inf> net_config: Waiting interface 1 (0x80001008) to be up...                                                      
[00:00:01.325,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:01.327,000] <wrn> net_if: iface 0x80001008 is down                                                                              
[00:00:02.329,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:02.331,000] <wrn> net_if: iface 0x80001008 is down                                                                              
[00:00:03.016,000] <inf> net_config: Interface 1 (0x80001008) coming up                                                                
[00:00:03.017,000] <inf> eth_mcux: ETH_0 enabled 100M full-duplex mode.                                                                
[00:00:03.017,000] <inf> net_config: IPv4 address: 192.0.2.1                                                                           
[00:00:03.118,000] <inf> net_config: IPv6 address: 2001:db8::1                                                                         
[00:00:03.333,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:04.339,000] <wrn> net_gptp: Not AS capable: 345931786 ns > 100000 ns                                                            
[00:00:05.341,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:06.342,000] <wrn> net_gptp: Not AS capable: 346392526 ns > 100000 ns                                                            
[00:00:07.344,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:08.345,000] <wrn> net_gptp: Not AS capable: 346336434 ns > 100000 ns                                                            
[00:00:09.347,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:10.349,000] <wrn> net_gptp: Not AS capable: 345663571 ns > 100000 ns                                                            
[00:00:11.350,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:12.351,000] <wrn> net_gptp: Not AS capable: 346172229 ns > 100000 ns                                                            
[00:00:13.352,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:14.354,000] <wrn> net_gptp: Not AS capable: 346178201 ns > 100000 ns                                                            
[00:00:15.357,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:16.359,000] <wrn> net_gptp: Not AS capable: 346286604 ns > 100000 ns                                                            
[00:00:17.360,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:18.362,000] <wrn> net_gptp: Not AS capable: 346406719 ns > 100000 ns                                                            
[00:00:19.364,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                           
[00:00:20.366,000] <wrn> net_gptp: Not AS capable: 345856974 ns > 100000 ns                                                            
[00:00:21.366,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:21.756,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:22.368,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:22.370,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:22.759,000] <wrn> net_gptp: Multiple pdelay requests                                                                            
[00:00:22.760,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:23.372,000] <wrn> net_gptp: Reset Pdelay requests

What can I do to help fix this?

@ainguraXmarquiegui , I guess this is an issue for gptp stack. @ceolin do you have any idea?

ainguraXmarquiegui commented 3 years ago

@hakehuang , I believe part of the imxrt1020evk related code is still not fully sorted for gptp usage. Like I said, none of those messages show up when testing on the FRDM-K64F board.

Some circumstantial evidence of differences between systems that support my suspicion:

I can synchronize clocks between by avnu gptp server on my PC and the FRDM-K64F board using neighborPropDelayThresh = 100000
I need a value of neighborPropDelayThresh = 1000000 with the imxrt1020evk in zephyr
I can use even lower values when using imxrt1020evk while running FreeRTOS

I have identified at least one key differences in the _enet_config structure for the imxrt1020evk files when comparing them to both the k64 in zephyr and the 1020 in freertos:

rxFifoStatEmptyThreshold is not present in the 1020 on zephyr. Looking a little bit deeper into the issue, I have discovered that the file modules/hal/nxp/mcux/devices/MIMXRT1021/MIMXRT1021_features.h is outdated when compared to the MCUXpresso version and it's missing lines such as this one:
```
#define FSL_FEATURE_ENET_HAS_RECEIVE_STATUS_THRESHOLD (1) 
```
that influence the behavior of the Ethernet device. I have tried adding that line to our test setup, but it's sadly not enough.

Is there any specific reason why nxp hal files aren't up to date? Is it possible that more changes could be missing that would make the issue go away?

ainguraXmarquiegui commented 3 years ago

Just in case it's useful at all, running the code on a imxrt1050evk results in the same type of error. See log:

*** Booting Zephyr OS build v2.6.0-rc1-118-g05d8c6fc78b2  ***                                                                                                                                                                                                          

[00:00:00.269,000] <wrn> net_if: iface 0x80001008 is down                                                                                                                                                                                                                      
[00:00:00.272,000] <inf> net_config: Initializing network                                                                                                                                                                                                                      
[00:00:00.272,000] <inf> net_config: Waiting interface 1 (0x80001008) to be up...                                                                                                                                                                                              
[00:00:01.271,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:01.273,000] <wrn> net_if: iface 0x80001008 is down                                                                                                                                                                                                                      
[00:00:02.275,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:02.278,000] <wrn> net_if: iface 0x80001008 is down                                                                                                                                                                                                                      
[00:00:03.011,000] <inf> net_config: Interface 1 (0x80001008) coming up                                                                                                                                                                                                        
[00:00:03.011,000] <inf> eth_mcux: ETH_0 enabled 100M full-duplex mode.                                                                                                                                                                                                        
[00:00:03.011,000] <inf> net_config: IPv4 address: 192.0.2.1                                                                                                                                                                                                                   
[00:00:03.111,000] <inf> net_config: IPv6 address: 2001:db8::1                                                                                                                                                                                                                 
[00:00:03.111,000] <inf> net_config: IPv6 address: 2001:db8::1                                                                                                                                                                                                                 
[00:00:03.280,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:04.284,000] <wrn> net_gptp: Not AS capable: 363324374 ns > 100000 ns                                                                                                                                                                                                    
[00:00:05.285,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 100000 ns                                                                                                                                                                                                   
[00:00:06.760,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:07.288,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:07.764,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:08.293,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:08.768,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:09.297,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:09.775,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:10.301,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:10.780,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:11.306,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:11.787,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:12.310,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:12.791,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:13.314,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:13.796,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:14.319,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:14.800,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:15.323,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:15.805,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:16.327,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:16.808,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:17.331,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:17.814,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:18.336,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:18.817,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:19.340,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:19.823,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:20.344,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:20.828,000] <wrn> net_gptp: Multiple pdelay requests                                                                                                                                                                                                                    
[00:00:20.928,000] <err> net_pkt: Data buffer (54) allocation failed.                                                                                                                                                                                                          
[00:00:21.349,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:21.451,000] <err> net_pkt: Data buffer (54) allocation failed.                                                                                                                                                                                                          
[00:00:21.451,000] <err> net_gptp: Failed to prepare PDELAY_REQ                                                                                                                                                                                                                
[00:00:21.933,000] <err> net_pkt: Data buffer (54) allocation failed.                                                                                                                                                                                                          
[00:00:22.453,000] <wrn> net_gptp: Reset Pdelay requests                                                                                                                                                                                                                       
[00:00:22.555,000] <err> net_pkt: Data buffer (54) allocation failed.                                                                                                                                                                                                          
[00:00:22.555,000] <err> net_gptp: Failed to prepare PDELAY_REQ                                                                                                                                                                                                                
[00:00:22.938,000] <err> net_pkt: Data buffer (54) allocation failed.

jukkar commented 3 years ago

BTW, the value CONFIG_NET_GPTP_NEIGHBOR_PROP_DELAY_THR can be increased if the default value 100000 is too small.

ainguraXmarquiegui commented 3 years ago

Setting CONFIG_NET_GPTP_NEIGHBOR_PROP_DELAY_THR to the 4294967295 value seen above the system still does not work. Here's the new output:

*** Booting Zephyr OS build v2.6.0-rc1-118-g05d8c6fc78b2  ***                                                                  

[00:00:00.323,000] <wrn> net_if: iface 0x80001008 is down                                                                              
[00:00:00.326,000] <inf> net_config: Initializing network                                                                              
[00:00:00.326,000] <inf> net_config: Waiting interface 1 (0x80001008) to be up...                                                      
[00:00:01.324,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:01.327,000] <wrn> net_if: iface 0x80001008 is down                                                                              
[00:00:02.328,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:02.331,000] <wrn> net_if: iface 0x80001008 is down                                                                              
[00:00:03.012,000] <inf> net_config: Interface 1 (0x80001008) coming up                                                                
[00:00:03.012,000] <inf> eth_mcux: ETH_0 enabled 100M full-duplex mode.                                                                
[00:00:03.013,000] <inf> net_config: IPv4 address: 192.0.2.1                                                                           
[00:00:03.114,000] <inf> net_config: IPv6 address: 2001:db8::1                                                                         
[00:00:03.332,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:04.752,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:05.013,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:05.338,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:05.737,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:06.340,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:06.342,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:06.741,000] <wrn> net_gptp: Multiple pdelay requests                                                                            
[00:00:06.741,000] <err> eth_mcux: ENET_SendFrame error: 4004                                                                          
[00:00:07.344,000] <wrn> net_gptp: Reset Pdelay requests                                                                               
[00:00:07.347,000] <err> eth_mcux: ENET_SendFrame error: 4004

If I reboot the system many times, most of them look like the code above, but once in a while I still get a message that looks like this:

[00:00:05.340,000] <wrn> net_gptp: Not AS capable: 4294967295 ns > 4294967295 ns

It might or might not be of any importance, but 4294967295 in hex is 0xFFFFFFFF by the way.