nxp-archive / openil

OpenIL is an open source project based on Buildroot and designed for embedded industrial solution.
Other
135 stars 55 forks source link

Require support to run NTP #23

Closed Norman03 closed 4 years ago

Norman03 commented 5 years ago

Hello,

Currently we are using LS1021ATSN switch + with openIL , we require steps to run NTP where internet is connected to the non-switched TSN ports (eth0 and eth1) and sync the global NTP time with the hardware time(including the switched TSN ports eth 2 to eth 5).

We also require support to run the RTC clock, so that we can get current/latest time during every boot up. I understand the RTC clock is not available in the LS1021ATSN switch, now what is the work around instead using external I2C hardware.

Is there any openIL releases planned on this RTC fix in the future, if that is the case when can we expect?

Regards, Norman

vladimiroltean commented 5 years ago

There are many ways to set this up, depending on what you want, and Buildroot already supports several NTP clients. What stratum server will you be connecting to? Do you have a particular accuracy requirement? Do you need NTP or SNTP is fine? Have you tried to set up e.g. chrony + timemaster in conjunction with phc2sys? I think you are confusing some terms when you are asking about RTC, or at least I don't understand the request for an OpenIL release with "RTC fix". The RTC is a persistent clock source that is physically lacking on the board. The cores can keep system time while powered on but not while power is removed.

vladimiroltean commented 5 years ago

Why do you need both an RTC and NTP?

Norman03 commented 5 years ago

Thanks, As you suggested I'm able recompile the image with NTP kernel config enabled. I performed below steps and still see large PHC2SYS offset in my slave PC's

  1. NXP switch as PTP master (synced with global NTP stratum 0 and verified the updated current UTC time using "date" command)
  2. Connected my PTP slave PC's with the switch(PTP master) and able to see nanoseconds offset
  3. But when I tried to sync the hardware clock(PTP) to system clock using PHC2SYS, Iam seeing large offset
  4. When I use "date" command in the PC's Iam able to see epoch "1970-Jan-01" is been updated. Then I'm able confirm this time exchanged by the NXP switch PTP master to all the slaves.

Using NTP in the NXP switch, the system clock is updated with the UTC current time but the hardware clock is not synced in the NXP switch. Still the hardware clock is epoch and PTP exchange this to all other slaves. Does my understanding is correct?

Can you explain me how to sync the NTP time with the hardware clock(epoch) in the NXP switch?

Regards, Norman

vladimiroltean commented 5 years ago

when I tried to sync the hardware clock(PTP) to system clock using PHC2SYS, Iam seeing large offset

What command are you using for that? What happens if you run phc2sys -a -r -r -m?

Norman03 commented 5 years ago

Iam using this command in the PC phc2sys -s enp1s0 -w -mq

vladimiroltean commented 5 years ago

And what is the output/your expected result? The point in specifying -r twice is that phc2sys will serve the system time (disciplined by NTP) over PTP.

man phc2sys

       -r     Only  valid  together  with  the  -a  option. Instructs phc2sys to also synchronize the system clock (CLOCK_REALTIME). By default, the system clock is not considered as a possible time
              source. If you want the system clock to be eligible to become a time source, specify the -r option twice.
vladimiroltean commented 5 years ago

Just a hint: if you want to use the device as a true PTP switch (including swp0 - swp3) you might want to use a community kernel. There are issues at the moment which prevent that from being integrated into the OpenIL 4.14 kernel version.

Norman03 commented 5 years ago

And what is the output/your expected result? The point in specifying -r twice is that phc2sys will serve the system time (disciplined by NTP) over PTP.

man phc2sys

       -r     Only  valid  together  with  the  -a  option. Instructs phc2sys to also synchronize the system clock (CLOCK_REALTIME). By default, the system clock is not considered as a possible time
              source. If you want the system clock to be eligible to become a time source, specify the -r option twice.

Still I'm getting the larger offset.

Thanks for sharing the kernel. I'm compiling the SD card image with this kernel and I try to sync the NXP switch(PTP master) with other PC (PTP slaves) and sync the PC system clock (PHC2SYS) with the PC PTP hardware clock.

Norman03 commented 5 years ago

Just a hint: if you want to use the device as a true PTP switch (including swp0 - swp3) you might want to use a community kernel. There are issues at the moment which prevent that from being integrated into the OpenIL 4.14 kernel version.

@vladimiroltean As you suggested, I cloned the community kernel and configured the make menuconfig to compile this kernel. During the image compilation EULA package takes too long time to compile and also after so long hours there were no progress in the compilation. Ref the screen shot in the attachment.

Do you have any idea on this?

error

vladimiroltean commented 5 years ago

On one hand, you read 'ELUA' with a typo: it is eLua (embedded Lua interpreter) rather than EULA (end user license agreement). On the other hand, I have no idea why you are building the efl package in the first place. Are you sure you are building for the nxp_ls1021atsn_defconfig target?

But you do have a point, and this is that it's not trivial to test the community kernel. However you shouldn't expect it to be. But that doesn't mean the OpenIL target for the LS1021A-TSN isn't slightly messy right now - it is. So I took some time, I forked the OpenIL build system, and made it build the community kernel by default (along with many other cleanups and upgrades which were necessary to support newer packages such as iproute2-next).

I've build-tested the thing a few times, and I ran it for almost a day, but it's still possible there might be some issues. Please let me know if you're facing any problems with it.

Of course, to compile, just do:

make nxp_ls1021atsn_defconfig
make
# use output/images/sdcard.img
Norman03 commented 5 years ago

@vladimiroltean While building the sdcard.img using the commands suggested by you in the above thread, there were build errors when >>> host-nodejs 10.16.3 is building. comm_ker_err_edit

openIL version(community kernel): https://github.com/vladimiroltean/openil

Commands used to build: make nxp_ls1021atsn_defconfig make

Can you please help me to solve this?

vladimiroltean commented 5 years ago
/usr/bin/ld: final link failed: Symbol needs debug section which does not exist

That suggests a problem with the toolchain. I have not encountered that while building. Did you by any chance update the repository in the middle of the compilation process? I made some force-pushes to it, including a toolchain change (externally downloaded -> compiled by buildroot) since it was needed for a change in kernel headers. What is the HEAD of your master branch currently pointing at (git show)? It should be at 846349af3cac99f095f2f52d0773ed588b512f35 (board: nxp: ls1021a-tsn: Use linux-headers 5.2 package from kernel.org) If it isn't pointing to this commit, could you please restart the build with the latest settings so that all packages are in a consistent state? That would entail:

rm -rf output
git fetch origin
git reset --hard origin/master
make nxp_ls1021atsn_defconfig
make

Sorry for the trouble with the force-pushing. I'll do further changes on a devel branch and try to keep a linear history for master.

Norman03 commented 5 years ago

@vladimiroltean 846349a is the commit in which the current build is happening. Now, In the kernel .config file I have changed the BR2_PACKAGE_NODEJS=y to BR2_PACKAGE_NODEJS is not set.

Basically I skipped the nodejs library. Now the build is happening. Is this a hard dependency package?

vladimiroltean commented 5 years ago

Ok, let me rephrase. Does this error occur with a completely clean build?

No, nodejs is a web server runtime. You don't need it. And it's not kernel config, it's buildroot config.

Norman03 commented 5 years ago

Ok, let me rephrase. Does this error occur with a completely clean build?

Just now, I started to rebuild after cleaning and fetching the latest master. Let you know if any error occurs.

No, nodejs is a web server runtime. You don't need it. And it's not kernel config, it's buildroot config.

Ok.

Thanks for your prompt support.

Norman03 commented 5 years ago

@vladimiroltean Meanwhile I have a doubt: After a successful built SD card image (community kernel) and I'm able to boot the LS1021ATSN switch. While boot up I'm able to see the following logs. eeprom_fail

From this I'm able to understand,

  1. We have to set the MAC address for every ethernet ports in uboot

Why EEPROM has invalid ID? Am I doing anything wrong here?

vladimiroltean commented 5 years ago

The EEPROM has invalid ID because that's how it comes out of the factory. I don't know why beyond that. It is documented in the U-Boot board README file how you can set the MAC addresses persistently. You only do it once.

Norman03 commented 5 years ago

@vladimiroltean, Now I'm trying to run ptp master in the LS1021ATSN switch and connect the other PC's as PTP slaves in TSN switched ports. ptp4l

master: ptp4l -i eth2 -2 -mq Slave: ptp4l -i eth1 -2 -mq -s

The slave was not able to detect the master node.

vladimiroltean commented 5 years ago

So the community version of OpenIL for LS1021A-TSN doesn't use /etc/init.d, but systemd (as you'll find out if you open the README file in that folder). With the switch ports in the DSA kernel driver, you are not supposed to run ptp4l over eth2 (which is only a control interface). You are supposed to run ptp4l over swp2, swp3, swp4, swp5. Look around first, list the Ethernet interfaces, make sure they are up (eth2 needs to be up in order for switch net devices to be up), put an IP on br0, see if you can ping, etc. Read the DSA documentation and the driver documentation. Then finally see the /lib/systemd/system/linuxptp.service and /etc/linuxptp.cfg files. The linuxptp service has been customized for the switch to operate as a P2P_TC, since the device's primary use case is as a switch. It cannot be a grandmaster in this mode. Switches can only be grandmasters in 802.1AS, which is currently not a thing in linuxptp yet. So you might need to change it. The linuxptp-system-clock service (phc2sys) has not been customized at all. You will definitely need to adapt that. To activate the services:

systemctl enable linuxptp
systemctl start linuxptp
systemctl enable linuxptp-system-clock
systemctl start linuxptp-system-clock

To monitor them:

journalctl -b -u linuxptp -f
journalctl -b -u linuxptp-system-clock -f
Norman03 commented 5 years ago

@vladimiroltean br0 interface is up and able to ping the slave PC's from the LS1021ATSN switch. Now I have connected 1PC(PTP master) and LSA1021TSN switch (PTP slave) and started the ptp service. Now the switch is not able to send delayed response and I'm not able to see the offset in the log file.

ptplogs

In the log file selected local clock 00049f.fffe.ef0606 as best master is logged.

vladimiroltean commented 5 years ago

Up? No. You need to bring it up with ip link set dev br0 up (same as all others, I think). Present? Yes, due to the systemd-networkd configuration files that are pre-installed.

Norman03 commented 5 years ago

@vladimiroltean Now I have connected 1PC(PTP master) and LSA1021TSN switch (PTP slave) and started the ptp service. Now the switch is not able to send delayed response and I'm not able to see the offset in the log file. ptplogs

In the log file selected local clock 00049f.fffe.ef0606 as best master is logged.

vladimiroltean commented 5 years ago
selected /dev/ptp0 as PTP clock

That is the eTSEC PTP clock (eth0, eth1). Which ports have you kept in /etc/linuxptp.cfg? It is possible to make the device be a switch across both /dev/ptp0 and /dev/ptp1, but it is a bit more complicated: you will need another instance of phc2sys that keeps them in sync. Can you draw a diagram of the 1588 network you're trying to establish, so I can help you customize the ptp4l daemon accordingly? Also please make sure that all devices are speaking the same protocol (1588 - not 802.1AS, L2 transport, peer delay).

Norman03 commented 5 years ago

@vladimiroltean, I trying to bring up this architecture.

ptp4l configuration file: ptpconfig

Architecture: 15883

vladimiroltean commented 5 years ago

Ok, so swp2 is a 1588 endpoint (ordinary clock)? In that case, what happens if you try the following in /etc/linuxptp:

[global]
delay_mechanism     P2P
clock_type      OC
network_transport   L2
time_stamping       hardware
step_threshold      1.0
tx_timestamp_timeout    10

[swp2]

Please keep in mind that the UDPv4 transport you specified in the above picture will not work. You need to match transports in the entire PTP network, and the switch ports only support L2.

Norman03 commented 5 years ago

@vladimiroltean I tried the following. Still I'm getting the same logs. I'm trying to understand selected local clock 00049f.fffe.ef0808 as best master log. ppppp

vladimiroltean commented 5 years ago

That message means that a timeout expired and the port saw no ANNOUNCE messages (or there were ANNOUNCE messages of lower priority) and BMCA decided that the grandmaster should be itself.

Let me ask you again: Is eth2 up? Is swp2 up? They both need to be brought up, in this order.

Norman03 commented 5 years ago

@vladimiroltean I'm able to see the interfaces were up.

ifconfig

vladimiroltean commented 5 years ago

Ok, I can reproduce the issue. Let me see what's going on.

vladimiroltean commented 5 years ago

My mistake, I did not actually reproduce an issue but I was testing on another port. It does work fine for me. I made a mistake and I wrote tx_timestamp_threshold instead of tx_timestamp_timeout above, but I corrected it. Can you share more details about the PTP master? I still don't think they are using the same protocol. If you stop the linuxptp service, and tcpdump -i swp2, do you see anything?

Norman03 commented 5 years ago

@vladimiroltean Iam using ptp4l -i <INFNAME> -mq -2 command in the PTP master. After stopping the ptp4l service in LSA1021ATSN I'm able to see this; trace

vladimiroltean commented 5 years ago

If you had set up the devices as per the above suggestion, you would have received these warnings: On the switch:

ptp4l[3684.670]: port 1: delay request on P2P port

and on the Intel card:

ptp4l[6657118.241]: port 1: pdelay_req on E2E port

By looking at tcpdump I am now convinced that the Intel device speaks E2E over L2, like you indicated. But I am still not convinced that the switch speaks the same protocol. With the service still disabled, what happens if you run this on the switch:

[root@OpenIL ~] # ptp4l -i swp2 -2 -m --tx_timestamp_timeout 10 -s                                                                                                                                         
ptp4l[3953.028]: selected /dev/ptp1 as PTP clock
ptp4l[3953.130]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[3953.131]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[3953.458]: port 1: new foreign master 6805ca.fffe.39cdca-1
ptp4l[3957.458]: selected best master clock 6805ca.fffe.39cdca
ptp4l[3957.458]: running in a temporal vortex
ptp4l[3957.458]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[3960.458]: master offset -18675993397429258 s0 freq  +16672 path delay      3556
ptp4l[3961.458]: master offset -18675993397459726 s1 freq  -13797 path delay      3760
ptp4l[3962.458]: master offset        170 s2 freq  -13627 path delay      3760
ptp4l[3962.458]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[3963.458]: master offset        746 s2 freq  -13000 path delay      3232
ptp4l[3964.458]: master offset        154 s2 freq  -13368 path delay      3232
ptp4l[3965.458]: master offset       2068 s2 freq  -11408 path delay      1118
ptp4l[3966.458]: master offset        814 s2 freq  -12041 path delay       180
ptp4l[3967.458]: master offset       -432 s2 freq  -13043 path delay      -142
ptp4l[3968.458]: master offset       -714 s2 freq  -13455 path delay      -372
ptp4l[3969.458]: master offset       -440 s2 freq  -13395 path delay      -782
ptp4l[3970.458]: master offset       -664 s2 freq  -13751 path delay      -742
ptp4l[3971.458]: master offset       -496 s2 freq  -13782 path delay      -742
ptp4l[3972.458]: master offset       -288 s2 freq  -13723 path delay      -742
ptp4l[3973.458]: master offset       -104 s2 freq  -13625 path delay      -782
ptp4l[3974.458]: master offset        -36 s2 freq  -13588 path delay      -802
ptp4l[3975.458]: master offset         30 s2 freq  -13533 path delay      -844
ptp4l[3976.458]: master offset          6 s2 freq  -13548 path delay      -844
ptp4l[3977.458]: master offset         -8 s2 freq  -13560 path delay      -854
ptp4l[3978.458]: master offset        -32 s2 freq  -13587 path delay      -854
ptp4l[3979.458]: master offset         -8 s2 freq  -13572 path delay      -854
ptp4l[3980.458]: master offset         -8 s2 freq  -13575 path delay      -854
Norman03 commented 5 years ago

@vladimiroltean If i run Master(intel card): ptp4l -i <INFNAME> -mq -2 Slave(LS1021ATSN): ptp4l -i swp2 -2 -m --tx_timestamp_timeout 10 -s

Iam able to get the master offset as what you get.

vladimiroltean commented 5 years ago

Well, I guess problem solved, then? Just transpose the slave settings into a blank /etc/linuxptp.cfg file:

While you add these settings to the config file you will probably notice which one was set incorrectly.

Norman03 commented 5 years ago

Done! Things got working. If I need to add a slave in swp3 I have to run the below command in my slave PC connecting the intel card to swp3. Am I right?

ptp4l -i <IFNAME> -m tx_timestamp_timeout 10 -2 -s -P

vladimiroltean commented 5 years ago

No, if you want to add further slave devices to the switch, then the clock_type is no longer an OC (ordinary clock), but either a BC (Boundary Clock) or a P2P_TC (Transparent Clock), because you want the switch not only to synchronize to the master*, but also relay time to the other slaves. And you don't start another ptp4l instance, you just specify multiple interfaces (-i swp2 -i swp3). So look at the variety of predefined linuxptp configs, and pick your assortment.

*When in P2P_TC mode, the transparent clock does not synchronize to the master unless free_running is 0.

Norman03 commented 5 years ago

@vladimiroltean TSN requires switches that are 802.1AS compliant. Is it possible to time sync using 802.1AS?

vladimiroltean commented 5 years ago

I don't really understand 802.1AS, but I think it is a misconception that TSN requires it. TSN requires synchronized clocks across the network. This is so that switches may enforce time-based admission control and scheduling for offloaded traffic. And 1588 does the job just fine for that. Anyway, as far as my 802.1AS understanding goes, the synchronization is only 'logical', since all devices timestamp based on free-running clocks, and correct those based on cumulativeScaledRateOffset from the follow up information TLV that is present in the PTP messages. But since all PTP hardware clocks are fundamentally still free-running, how would 802.1Qbv/802.1Qci still work? I think the goals are conflicting. Please change my mind.

Norman03 commented 5 years ago

Two major point:

  1. The frames of IEEE1588 are different when compared to 802.1AS
  2. The equations for deriving the PDelay response are not same between IEEE1588 and 802.1AS

Considering the above point and 802.1AS has the advanced profile when compared to the IEEE1588 you cannot possibly connect the IEEE1588 and 802.1AS devices together. There should be bridge to connect these both in a same network, basically a special hardware.

TSN standard is 802.1(AS,QBV, Qbu etc..,) So a TSN device should support 802.1AS

vladimiroltean commented 5 years ago

Point #1 is circular and does not bring in fact any argument: you need to support 802.1AS because others support 802.1AS too. But not why 802.1AS itself would be better. Point #2 does not explain what the benefits of 802.1AS synchronization algorithms are. Furthermore, my current understanding of them tells me that they can not possibly inter-operate with the goals of 802.1Qbv. If you can explain how the 2 can be reconciled, and whether an 802.1AS bridge with a hardware-synchronized PTP clock can be built (which can be used to trigger gate open events for Qbv), I'm all ears.

Norman03 commented 5 years ago

@vladimiroltean By following the steps above, I'm able to see the ptp4l time synchronization. But it's not stable. The linuxptp service has been customized for the switch to operate as a P2P_TC. So other PC (PTP slaves) are connected to switch. After 20 mins in the linuxptp log file I'm able to see "tc failed to forward message in port 1". stability_issue

vladimiroltean commented 5 years ago

Is there an associated kernel log to go along with this error? Or are the error messages restricted to ptp4l? Could you share the exact configuration file?

Norman03 commented 5 years ago

@vladimiroltean There is no kernel log with respective to this.

/etc/linuxptp.cfg

[global]
slaveOnly                       1
delay_mechanism         P2P
network_transport          L2
tx_timestamp_timeout    20
clock_type                      P2P_TC
utc_offset                        36

[swp2]
[swp3]
[swp4]

When running ptp4l command instead of service, I'm able to see the 'port2: the link is down' but the interfaces are up and master is running. I don't understand the issue here. Could you please help me solving this?

error_P2P

vladimiroltean commented 5 years ago

There is no kernel log with respective to this.

Ok, so it is an application-level issue.

So you are running ptp4l on swp2, swp3 and swp4, but the link is down on one of the interfaces. You need to think about what a transparent clock does. It receives sync frames on one port, and forwards them on all other ports, since it has no way of knowing which ports have slaves interested in those frames and which don't. In effect, that means it will attempt to send frames even over interfaces that are down. I see your configuration file is missing tc_spanning_tree 1, which in effect starts keeping track of the topology and should avoid sending frames on interfaces it does not need to. Here is the explanation with which this setting was added:

commit e6af4608c4d672490398a8cbcb17b8ee5033c191
Author: Richard Cochran <richardcochran@gmail.com>
Date:   Mon Apr 16 16:20:06 2018 -0700

    config: Add a configuration option for preventing loops in TC mode.

    According to 1588, PTP message loops are simply someone else's problem
    with respect to transparent clocks.  Since we are running the BMCA for
    syntonization anyway, we might as well go ahead and implement the spanning
    tree for PTP messages.

    Signed-off-by: Richard Cochran <richardcochran@gmail.com>

So I guess you need to enable the tc_spanning_tree setting, and then figure out why your link is down.

Norman03 commented 5 years ago

To identify the problem I simplified the architecture, I connected ptp4l master PC interface to swp2(slave) in switch and started synchronization(no other slaves are connected in network 1 slave LSA1021ATSN and 1 master). After few mins I'm able to see "rouge peer delay response" log in master PC terminal and same in the switch as given below.

rouge

vladimiroltean commented 5 years ago

A few days ago there was this discussion on linuxptp-users related to rogue peer delay responses. I wonder whether the issue is the same.

Norman03 commented 5 years ago

@vladimiroltean Do you have any idea on "rogue peer delay responses" on master PC?

Norman03 commented 5 years ago

@vladimiroltean the rouge guy error is fixed. Now, I'm not able to sync all the slave PC's. But switch is working as a slave and synchronization is done. the P2P_TC clock type is added in the switch but the slaves PC log says "selected best master clock 6805ca.fffe.8cfd92".

slaveOnly

slave PC command: ptp4l -i -mq -s -P

vladimiroltean commented 5 years ago

synchronization is done

Synchronization is never "done", it is a continuous process. You mean that the switch prints selected best master clock 6805ca.fffe.8cfd92 and then stops? I have seen that behavior before, but right now I can't seem to be able to reproduce it. I've been running the P2P_TC on a switch for a couple of hours now and it still works:

Feb 15 12:30:01 OpenIL ptp4l[548]: [94686.287] rms   13 max   18 freq -19744 +/-  11 delay   745 +/-   1                                                                                                   
Feb 15 12:30:03 OpenIL ptp4l[548]: [94688.287] rms    3 max    4 freq -19740 +/-   1 delay   746 +/-   0

What are your linuxptp endpoint (master and slave) settings and software version? Mine are:

[global]
#
# Default Data Set
#
slaveOnly       0
socket_priority     0
#
# Run time options
#
tx_timestamp_timeout    10
#
# Servo Options
#
step_threshold      0.00002
first_step_threshold    0.00002
#
# Default interface options
#
clock_type      OC
network_transport   L2
delay_mechanism     P2P
#
# Clock description
#
productDescription  ;;
revisionData        ;;
manufacturerIdentity    00:00:00
userDescription     ;
timeSource      0xA0

Does the synchronization on the switch stop spontaneously, or is it anything in particular that triggers it? Does it stop when no slaves are connected? 1 slave? 2 slaves? Can you collect another ptp4l log when it stops, but with "-l 7"?

vladimiroltean commented 4 years ago

I managed to reproduce it after all. Looks like increasing the logSyncInterval makes it reproduce faster. The ptp4l process appears to completely freeze, although I don't understand why yet. I will come back with a conclusion after some debugging.