nxp-archive / openil

OpenIL is an open source project based on Buildroot and designed for embedded industrial solution.
Other
136 stars 55 forks source link

LS1021ATSN - Problems to synchronise my IEEE 802.1AS network #59

Closed diegotxegp closed 4 years ago

diegotxegp commented 4 years ago

Hi there,

I would ask for help to synchronise my IEEE 802.1AS network.

My network is a LS1021ATSN platform with two boards connect to it, at ETH2 port and ETH3 ports. I chose these ports because they are ports of SJA1105TEL which supports IEEE 802.1AS. All the devices meet all the requirements to perfomance IEEE 802.1AS (HW timestamping...). In addition, they have IPs, and can ping one another.

                                                   LS1021ATSN
                                        (ETH2 port)       (ETH3 port)
                                                /                \
                                              /                    \
                                 (enp3s0 port)              (enp3s0 port)
                                      NODE 1                      NODE 2

To shyncronise it, I am following the OpenIL User Guide 1.8 released in May 2020, without success. I want LS1021ATSN as a grandmaster, broadcasting its time and the nodes as slaves. According to IEEE 802.1AS standard, LS1021ATSN as a time-aware bridge and the nodes as time-aware end stations.

In LS1021ATSN, I run this command: ptp4l -i eth2 -f /etc/ptp4l_cfg/gPTP.cfg -2 -m eth2 because is the port which connects LS1021 processor with SJA1105TEL switch. I have seen that in the new manual you can also choose swp2, swp3, swp4, swp5. But with identical results if I use -i swp2 -i swp3) In each node, I run this command: ptp4l -i enp3s0 -2 -m

In both the LS1021ATSN and the nodes, I write -2 because I want to synchronise via raw Ethernet, and not via UDP/IPv4. Anyway, I tried and no success.

Problem. Each device assumes its own clock as grandmaster and that is all.

I ask for help to synchronise the above network because we want to continue with the project.

Thank you very much for reading. Any suggestion is welcome.

Diego G.

vladimiroltean commented 4 years ago

Hi Diego, You cannot just run ptp4l -i enp3s0 -2 -m and interoperate with the LS1021A-TSN running 802.1AS, because there are more on-the-wire fields that are incompatible than just the transport (follow_up_info, transportSpecific, delay_mechanism). You need to use the same gPTP.cfg as you use on the switch board. Like this:

ptp4l -i enp3s0 -f gPTP.cfg -m

Needless to say, please make sure you have a recent version of ptp4l on those other stations (ideally you would use the same ptp4l version, i.e. the one from OpenIL, but you could also use the one from the mainline master branch). I would say not to rely on the linuxptp package shipped by the distribution, especially if that is an older version of Ubuntu.

Also, if you use gPTP.cfg, then specifying "-2" is redundant, since network_transport is already L2 in the config file.

Regards, -Vladimir

diegotxegp commented 4 years ago

Thank you very much for your response, Vladimir.

  1. If I must use "sudo ptp4l -i enp3s0 -f gPTP.cfg -m", the OpenIL User Guide is wrong. See section 5.6.2 Time-aware bridge verification.

  2. I have checked ptp4l version in all the devices (LS1021ATSN, NODE 1, NODE 2), and they have linuxptp version 2.0. In the nodes, I downloaded the linuxptp package from the web site and not from apt repository.

  3. Ok to the redundancy about "-2". I understood.

With your comment checked, I will try it again and no success.

LS1021ATSN: ptp4l -i eth2 -f /etc/ptp4l_cfg/gPTP.cfg -m Node 2 (I tried only for one node): sudo ptp4l -i enp3s0 -f /home/node1/linuxptp-2.0/configs/gPTP.cfg -m

Result of LS1021ATSN after command:

nodo2@nodo2-N-A:~$ sudo ptp4l -i enp3s0 -f /etc/ptp4l_cfg/gPTP.cfg -m ptp4l298.447]: selected /dev/ptp0 as PTP clock ptp4l[298.540]: driver rejected most genral HWSTAMP filter ptp4l[298.540]: ioctl SIOCSHWSTAMP failed: Device or resource busy ptp4l[298.620]: port 1: INITIALIZING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[298.621]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE

Result of Node 2 after command:

nodo2@nodo2-N-A:~$ sudo ptp4l -i enp3s0 -f /home/nodo2/linuxptp-2.0/configs/gPTP.cfg -m ptp4l[310.599]: selected /dev/ptp0 as PTP clock ptp4l[310.637]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[310.637]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[314.055]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[314.055]: selected local clock aabbcc.fffe.00094a as best master ptp4l[314.055]: assuming the grand master role

How could I fix it?

Thank you so much for your quick response.

Diego G.

vladimiroltean commented 4 years ago
  1. Since the default user in OpenIL is root, sudo is unnecessary. But I agree that it should be mentioned that ptp4l requires administrator privileges for changing timestamping settings.
  2. linuxptp version 2.0 doesn't mean a lot. On the master branch there have been 123 new commits since release tag 2.0. But anyway, that being said, even release version 2.0 should be enough.

LS1021ATSN: ptp4l -i eth2 -f /etc/ptp4l_cfg/gPTP.cfg -m

No, not like this. You need to run ptp4l on the individual switch interfaces, not on their host port. Like this:

ptp4l -i swp2 -i swp3 -f /etc/ptp4l_cfg/gPTP.cfg -m

Result of LS1021ATSN after command:

nodo2@nodo2-N-A:~$ sudo ptp4l -i enp3s0 -f /etc/ptp4l_cfg/gPTP.cfg -m ptp4l298.447]: selected /dev/ptp0 as PTP clock ptp4l[298.540]: driver rejected most genral HWSTAMP filter ptp4l[298.540]: ioctl SIOCSHWSTAMP failed: Device or resource busy ptp4l[298.620]: port 1: INITIALIZING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[298.621]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE

This confuses me. LS1021A-TSN does not have an interface named enp3s0. That being said, if you intended to say "ptp4l -i eth2", then the error is fully expected, see above.

Regards, -Vladimir

diegotxegp commented 4 years ago
1. Since the default user in OpenIL is root, sudo is unnecessary. But I agree that it should be mentioned that ptp4l requires administrator privileges for changing timestamping settings.

I use "sudo" for my nodes. They do not have privileges by default. In the LS1021ATSN platform, yes, I am "root".

2. linuxptp version 2.0 doesn't mean a lot. On the master branch there have been 123 new commits since release tag 2.0. But anyway, that being said, even release version 2.0 should be enough.

I will check this in depth. Thank you.

LS1021ATSN: ptp4l -i eth2 -f /etc/ptp4l_cfg/gPTP.cfg -m

No, not like this. You need to run ptp4l on the individual switch interfaces, not on their host port. Like this:

ptp4l -i swp2 -i swp3 -f /etc/ptp4l_cfg/gPTP.cfg -m

Yes. As you can read at my first post I said that I tried both "-i eth2" and "-i swp2 -i swp3", and both results were wrong.

Result of LS1021ATSN after command: nodo2@nodo2-N-A:~$ sudo ptp4l -i enp3s0 -f /etc/ptp4l_cfg/gPTP.cfg -m ptp4l298.447]: selected /dev/ptp0 as PTP clock ptp4l[298.540]: driver rejected most genral HWSTAMP filter ptp4l[298.540]: ioctl SIOCSHWSTAMP failed: Device or resource busy ptp4l[298.620]: port 1: INITIALIZING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[298.621]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE

This confuses me. LS1021A-TSN does not have an interface named enp3s0. That being said, if you intended to say "ptp4l -i eth2", then the error is fully expected, see above.

Sorry for this. I copied and I forgot remove that line. You are right with respect to enp3s0 is wrong. If someone else is reading this comments, I have to say that there would be written this such I wanted to say: "ptp4l -i eth2 -f /etc/ptp4l_cfg/gPTP.cfg -m"

Having said that, I have run again the commands with your modifications and these are the results. To go in advance, it is wrong again.

LS1021ATSN: ptp4l -i swp2 -i swp3 -f /etc/ptp4l_cfg/gPTP.cfg -m

[root@LS1021ATSN ~] # ptp4l -i swp2 -i swp3 -f /etc/ptp4l_cfg/gPTP.cfg -m ptp4l[4350.693]: selected /dev/ptp1 as PTP clock ptp4l[4350.790]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[4350.870]: port 2: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[4350.872]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[4353.920]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[4353.920]: selected local clock 00049f.fffe.ef0808 as best master ptp4l[4353.920]: assuming the grand master role ptp4l[4353.966]: port 2: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[4353.966]: selected local clock 00049f.fffe.ef0808 as best master ptp4l[4353.966]: assuming the grand master role ptp4l[4353.966]: assuming the grand master role ptp4l[4354.102]: port 2: new foreign master aabbcc.fffe.00094a-1 ptp4l[4354.102]: selected best master clock aabbcc.fffe.00094a ptp4l[4354.102]: assuming the grand master role ptp4l[4354.102]: assuming the grand master role ptp4l[4355.219]: timed out while polling for tx timestamp ptp4l[4355.219]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l[4355.219]: port 2: send sync failed ptp4l[4355.219]: port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[4355.290]: selected local clock 00049f.fffe.ef0808 as best master ptp4l[4355.290]: assuming the grand master role ptp4l[4355.290]: assuming the grand master role ptp4l[4355.794]: timed out while polling for tx timestamp ptp4l[4355.794]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l[4355.794]: port 1: send peer delay request failed ptp4l[4355.794]: port 1: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[4355.870]: selected local clock 00049f.fffe.ef0808 as best master ptp4l[4355.870]: assuming the grand master role ptp4l[4355.870]: assuming the grand master role

A loop with this text.

Regarding the node 2 text, it assume its own local clock is grand master.

As I said to you before, I will check about the linuxptp version because I do not know what else do. Do you know what is happening according to what I said before?

Thank so much for your quick response.

Diego

vladimiroltean commented 4 years ago

Ok, you're making good progress, and you're now very close to getting it running. I have 2 more comments:

You can stop the systemd service with "systemctl stop ptp4l", disable it forever with "systemctl disable --now ptp4l", and restart it with "systemctl restart ptp4l". Also, if you weren't aware of the ptp4l service, there's also a phc2sys service that you might need to be aware of.

If things still don't work, the next debugging step I would suggest is to add "-l 7" to the command-line invocation of ptp4l on the switch ports (which will increase the log level to debug) and share the output.

diegotxegp commented 4 years ago
* You are fully aware that by default in OpenIL 1.8, the ptp4l process already (and automatically) runs as a service, right? Check "pidof ptp4l", "systemctl status ptp4l" and "journalctl -u ptp4l -b -f". Since you are trying to run ptp4l directly from the command line now, I assume you have stopped this systemd process, right? Otherwise, you cannot have multiple ptp4l instances running on the same set of ports - they will collide.

I stopped anything. I just run the commands I found in the manual.

Without touching anything, I got these results running "ptp4l -i swp2 -i swp3 -f /etc/ptp4l_cfg/gPTP.cfg -m" at LS1021ATSN platform.

[root@LS1021ATSN ~] # pidof ptp4l 541 [root@LS1021ATSN ~] # systemctl status ptp4l ● ptp4l.service - Precision Time Protocol daemon Loaded: loaded (/usr/lib/systemd/system/ptp4l.service; disabled; vendor preset: enabled) Active: inactive (dead) [root@LS1021ATSN ~] # journalctl -u ptp4l -b -f -- Logs begin at Fri 2020-02-07 15:50:54 UTC. --

* Increasing `tx_timestamp_timeout` is indeed what is needed. It's not a driver bug, it's just a consequence of the fact that collecting TX timestamps over SPI from a switch port takes longer than the default threshold of 1 ms. You'll notice that the ptp4l systemd service reads the /etc/linuxptp.cfg configuration file, and that one has `tx_timestamp_timeout    20`. So if you disabled that and are using your own command, you should add `--tx_timestamp_timeout 20` to the command-line invocation of ptp4l too.

LS1021ATSN I add "--tx_timestamp_timeout 20" only at LS1021ATSN platform and I began to see good results, I think. Aleluyah? :)

[root@LS1021ATSN ~] # ptp4l -i swp2 -i swp3 -f /etc/ptp4l_cfg/gPTP.cfg -m --tx_timestamp_timeout 20 ptp4l[7606.249]: selected /dev/ptp1 as PTP clock ptp4l[7606.350]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[7606.430]: port 2: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[7606.431]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[7609.409]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[7609.409]: selected local clock 00049f.fffe.ef0808 as best master ptp4l[7609.409]: assuming the grand master role ptp4l[7609.542]: port 2: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[7609.543]: selected local clock 00049f.fffe.ef0808 as best master ptp4l[7609.544]: assuming the grand master role ptp4l[7609.545]: assuming the grand master role ptp4l[7642.618]: port 2: new foreign master aabbcc.fffe.00094a-1 ptp4l[7642.620]: selected best master clock aabbcc.fffe.00094a ptp4l[7642.620]: assuming the grand master role ptp4l[7642.620]: assuming the grand master role ptp4l[7679.111]: selected best master clock aabbcc.fffe.00094a ptp4l[7679.111]: assuming the grand master role ptp4l[7679.111]: assuming the grand master role ptp4l[7754.533]: selected best master clock aabbcc.fffe.00094a ptp4l[7754.533]: assuming the grand master role ptp4l[7754.533]: assuming the grand master role ptp4l[7777.516]: port 1: new foreign master aabbcc.fffe.00094e-1 ptp4l[7777.517]: selected best master clock aabbcc.fffe.00094e ptp4l[7777.517]: assuming the grand master role ptp4l[7777.517]: assuming the grand master role ptp4l[7807.539]: selected best master clock aabbcc.fffe.00094e ptp4l[7807.539]: assuming the grand master role ptp4l[7807.539]: assuming the grand master role ptp4l[7835.222]: selected best master clock aabbcc.fffe.00094e ptp4l[7835.224]: assuming the grand master role ptp4l[7835.225]: assuming the grand master role ptp4l[8218.626]: selected best master clock aabbcc.fffe.00094a ptp4l[8218.629]: assuming the grand master role ptp4l[8218.630]: assuming the grand master role

Node1: nodo1@nodo1-N-A:~$ sudo ptp4l -i enp3s0 -f /home/nodo1/linuxptp-2.0/configs/gPTP.cfg -m ptp4l[8568.757]: selected /dev/ptp0 as PTP clock ptp4l[8568.789]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[8568.789]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[8571.998]: port 1: new foreign master 00049f.fffe.ef0808-1 ptp4l[8572.064]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[8572.066]: selected local clock aabbcc.fffe.00094e as best master ptp4l[8572.067]: assuming the grand master role ptp4l[8573.998]: selected best master clock 00049f.fffe.ef0808 ptp4l[8573.999]: port 1: MASTER to UNCALIBRATED on RS_SLAVE ptp4l[8574.928]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[8575.679]: rms 237 max 312 freq +16779 +/- 161 delay 236 +/- 0 ptp4l[8576.680]: rms 39 max 79 freq +16881 +/- 52 delay 236 +/- 0 ptp4l[8577.681]: rms 56 max 65 freq +17001 +/- 20 delay 236 +/- 0 ptp4l[8578.683]: rms 40 max 55 freq +17031 +/- 3 delay 236 +/- 0

Node2 nodo2@nodo2-N-A:~$ sudo ptp4l -i enp3s0 -f /home/nodo2/linuxptp-2.0/configs/gPTP.cfg -m [sudo] password for nodo2: ptp4l[7616.302]: selected /dev/ptp0 as PTP clock ptp4l[7616.333]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[7616.333]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[7619.890]: port 1: new foreign master 00049f.fffe.ef0808-2 ptp4l[7619.954]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[7619.954]: selected local clock aabbcc.fffe.00094a as best master ptp4l[7619.954]: assuming the grand master role ptp4l[7621.891]: selected best master clock 00049f.fffe.ef0808 ptp4l[7621.891]: port 1: MASTER to UNCALIBRATED on RS_SLAVE ptp4l[7622.683]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[7623.309]: rms 795935305075302528 max 1591870610150606080 freq +8917 +/- 3938 delay 238 +/- 0 ptp4l[7624.311]: rms 695 max 1062 freq +13063 +/- 938 delay 237 +/- 0 ptp4l[7625.311]: rms 1155 max 1211 freq +15155 +/- 299 delay 236 +/- 0 ptp4l[7626.313]: rms 727 max 967 freq +15552 +/- 28 delay 236 +/- 0 ptp4l[7627.315]: rms 234 max 388 freq +15326 +/- 84 delay 234 +/- 0 ptp4l[7628.316]: rms 42 max 65 freq +15073 +/- 51 delay 234 +/- 0 ptp4l[7629.317]: rms 69 max 77 freq +14956 +/- 15 delay 234 +/- 0 ptp4l[7630.319]: rms 42 max 57 freq +14937 +/- 5 delay 234 +/- 0 ptp4l[7631.320]: rms 13 max 21 freq +14952 +/- 7 delay 234 +/- 0 ptp4l[7632.321]: rms 5 max 9 freq +14966 +/- 7 delay 234 +/- 0 ptp4l[7633.323]: rms 5 max 8 freq +14971 +/- 5 delay 234 +/- 0 ptp4l[7634.324]: rms 3 max 5 freq +14971 +/- 3 delay 236 +/- 0 ptp4l[7635.325]: rms 3 max 5 freq +14972 +/- 3 delay 236 +/- 0 ptp4l[7636.326]: rms 3 max 5 freq +14973 +/- 3 delay 236 +/- 0 ptp4l[7637.328]: rms 4 max 8 freq +14970 +/- 6 delay 236 +/- 0 ptp4l[7637.343]: timed out while polling for tx timestamp ptp4l[7637.343]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l[7637.344]: port 1: send peer delay request failed ptp4l[7637.344]: port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[7653.408]: port 1: FAULTY to LISTENING on INIT_COMPLETE ptp4l[7656.443]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[7656.443]: selected local clock aabbcc.fffe.00094a as best master ptp4l[7656.443]: assuming the grand master role ptp4l[7656.897]: port 1: new foreign master 00049f.fffe.ef0808-2 ptp4l[7658.897]: selected best master clock 00049f.fffe.ef0808 ptp4l[7658.897]: port 1: MASTER to UNCALIBRATED on RS_SLAVE ptp4l[7659.491]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[7660.367]: rms 122 max 214 freq +15186 +/- 58 delay 234 +/- 0 ptp4l[7661.368]: rms 34 max 49 freq +15029 +/- 33 delay 234 +/- 0 ptp4l[7662.370]: rms 47 max 58 freq +14958 +/- 7 delay 234 +/- 0 ptp4l[7663.371]: rms 24 max 37 freq +14954 +/- 6 delay 234 +/- 0 ptp4l[7664.372]: rms 8 max 15 freq +14963 +/- 6 delay 236 +/- 0 ptp4l[7665.374]: rms 4 max 7 freq +14975 +/- 3 delay 235 +/- 0 ptp4l[7666.375]: rms 4 max 9 freq +14977 +/- 5 delay 235 +/- 0 ptp4l[7667.376]: rms 3 max 6 freq +14976 +/- 4 delay 236 +/- 0 ptp4l[7668.378]: rms 4 max 6 freq +14978 +/- 5 delay 235 +/- 0 ptp4l[7669.379]: rms 3 max 6 freq +14978 +/- 5 delay 235 +/- 0 ptp4l[7670.380]: rms 4 max 7 freq +14979 +/- 5 delay 235 +/- 0

I have two queries about this.

  1. Should I add "--tx_timestamp_timeout 20" also in commands in the nodes?
  2. If the problem is increasing 20 the timeout, why is not is added at gPTP.cfg?

You can stop the systemd service with "systemctl stop ptp4l", disable it forever with "systemctl disable --now ptp4l", and restart it with "systemctl restart ptp4l".

I did not stop anything, however I got the above results. Is there something wrong? Explain me if there still something wrong.

Also, if you weren't aware of the ptp4l service, there's also a phc2sys service that you might need to be aware of.

Ok. I take notes about it.

Tell me if I need to do something else. I do not understood very well about stop the systemd service because I did not touch anything and it worked. I am confused.

Thank you so much, Vladimir.

Diego G.

vladimiroltean commented 4 years ago

I just run the commands I found in the manual.

Ok, so the service is installed but disabled by default. Good to know. To enable it, you would need to:

systemctl enable --now ptp4l
systemctl enable --now phc2sys

Should I add "--tx_timestamp_timeout 20" also in commands in the nodes?

It is a driver-specific setting. Different hardware will take a different amount of time to collect TX timestamps.

If the problem is increasing 20 the timeout, why is not is added at gPTP.cfg?

See above. To be fair, the LS1021A-TSN is setting this value into its linuxptp.cfg file (which is based on gPTP.cfg): https://github.com/openil/openil/blob/master/board/nxp/ls1021atsn/rootfs_overlay/etc/linuxptp.cfg#L22 But you are not using that. You would be if you were using the systemd service.

Does it help if you increase tx_timestamp_timeout for your enp3s0 card? What driver does it use, by the way?

diegotxegp commented 4 years ago

Does it help if you increase tx_timestamp_timeout for your enp3s0 card? What driver does it use, by the way?

If I do not add "--tx_timestamp_timeout 20" to my nodes commands, this happens:

nodo2@nodo2-N-A:~$ sudo ptp4l -i enp3s0 -f /home/nodo2/linuxptp-2.0/configs/gPTP.cfg -m [sudo] password for nodo2: ptp4l[295.824]: selected /dev/ptp0 as PTP clock ptp4l[295.869]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[295.870]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[299.254]: port 1: new foreign master 00049f.fffe.ef0808-2 ptp4l[299.518]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[299.518]: selected local clock aabbcc.fffe.00094a as best master ptp4l[299.520]: assuming the grand master role ptp4l[301.255]: selected best master clock 00049f.fffe.ef0808 ptp4l[301.255]: port 1: MASTER to UNCALIBRATED on RS_SLAVE ptp4l[302.238]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[302.864]: rms 795946901503803904 max 1591893803007609088 freq +10209 +/- 4374 delay 236 +/- 0 ptp4l[303.866]: rms 695 max 1064 freq +14539 +/- 945 delay 236 +/- 0 ptp4l[304.867]: rms 1153 max 1213 freq +16641 +/- 306 delay 236 +/- 0 ptp4l[305.868]: rms 731 max 971 freq +17050 +/- 28 delay 236 +/- 0 ptp4l[306.870]: rms 235 max 386 freq +16824 +/- 86 delay 235 +/- 0 ptp4l[307.871]: rms 43 max 65 freq +16576 +/- 55 delay 235 +/- 0 ptp4l[308.872]: rms 68 max 78 freq +16457 +/- 16 delay 235 +/- 0 ptp4l[309.874]: rms 41 max 61 freq +16437 +/- 5 delay 235 +/- 0 ptp4l[310.875]: rms 13 max 21 freq +16451 +/- 6 delay 235 +/- 0 ptp4l[311.876]: rms 5 max 8 freq +16469 +/- 4 delay 233 +/- 0 ptp4l[312.878]: rms 5 max 10 freq +16473 +/- 6 delay 232 +/- 0 ptp4l[313.879]: rms 4 max 8 freq +16474 +/- 5 delay 233 +/- 0 ptp4l[314.880]: rms 4 max 8 freq +16468 +/- 5 delay 235 +/- 0 ptp4l[314.881]: timed out while polling for tx timestamp ptp4l[314.882]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l[314.882]: port 1: send peer delay request failed ptp4l[314.882]: port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)

Sometimes, there is some message like this last one and it comes back to synchronise with the master (LS1021ATSN). I noticed that adding "--tx_timestamp_timeout 20" it does not happen. Tell me what you recommend.

The driver it uses is: driver=igb driverversion=5.4.0-k (according to the "sudo lshw -C network" command).

I have another query for you. According to section 5.6 Quick Start for IEEE 802.1AS, I must remove "neighborPropDelayThresh 800" from the gPTP file. Should I remove it or I keep it? Tell me.

Thank you so much for your help.

Diego G.

vladimiroltean commented 4 years ago

Hi Diego,

I noticed that adding "--tx_timestamp_timeout 20" it does not happen. Tell me what you recommend.

There is no downside to keeping the TX timestamping timeout at 20 ms, if it works and if timestamps are delivered faster than the Sync packet interval (which they are, since for 802.1AS, that is set at 125 ms). Synchronization is not going to be worse in any way or things like that. As for the "[TX timestamp not received within 1 ms => ] likely caused by a driver bug" message, whoever wrote that must be totally disconnected from the real world.

The driver it uses is: driver=igb driverversion=5.4.0-k

Good to know if others might be reading this.

According to section 5.6 Quick Start for IEEE 802.1AS, I must remove "neighborPropDelayThresh 800" from the gPTP file. Should I remove it or I keep it? Tell me.

IEEE 802.1AS denies non-PTP-aware systems from the synchronization network, since those would introduce jitter in forwarding latency of PTP packets and the entire synchronization would fall apart. But those non-PTP-aware systems don't speak PTP (of course) so there needs to be some mechanism to detect these below-the-radar devices. The method that is used is to look at the propagation delay (RX timestamp - TX timestamp) of PTP packets. If it is above a certain threshold (trial and error), it means that there must be some sort of digital logic which is receiving a packet on one port and sending it on another (corollary: if it's lower than the threshold it means that it's just the PHY and wire propagation delay between a point-to-point link). But each Ethernet PHY has its own specific propagation delay (often times asymmetric for the RX and TX directions). Some PHY vendors publish those numbers, some don't. The average propagation delay contains an RX propagation delay of one PHY and the TX propagation delay of the other PHY, and is printed by ptp4l in the statistics (example: delay 235 [nanoseconds] in your output). The threshold configured in gPTP.cfg is 800 ns. Therefore, if the propagation delay of the 2-PHY system would exceed 800 ns, ptp4l would think there's something like a PTP-unaware switch in between, and would prevent going into asCapable mode. So that chapter from the documentation is just informing you that, as a debugging step, you might try to remove the threshold so that 802.1AS synchronization failures due to that reason could be avoided. But since your propagation delay is well below the threshold, I would suggest that you're in the clear and that you have no reason to remove it.

Hope this helps, -Vladimir

diegotxegp commented 4 years ago

Thank you so much, Vladimir. Your help was fundamental for me.

I also hope that this thread helps more people. I tried to explain my problem with a great number of details to be it very understandable. And with your comments, it is crystal clear.

Thank you so much again, Vladimir.

Diego G.