jclark / rpi-cm4-ptp-guide

Guide to using the hardware PTP support in the Raspberry Pi CM4
MIT License
71 stars 10 forks source link

event time via /dev/ptp0 not in order sporadically #9

Open kondou opened 1 year ago

kondou commented 1 year ago

My CM4 modules (I have thee) have been up and running over one and half month, and I sometimes see unlock state reported by ts2phc like this.

root@gm2:/home/kondou# grep "master offset" /var/log/syslog|grep s0 | tail -10 Mar 17 16:10:38 gm2 ts2phc: [202588.984] eth0 master offset 999999999 s0 freq +1286 Mar 17 16:10:39 gm2 ts2phc: [202589.992] eth0 master offset 1 s0 freq +1286 Mar 18 08:10:51 gm2 ts2phc: [260201.988] eth0 master offset 1000000004 s0 freq +1186 Mar 18 08:10:52 gm2 ts2phc: [260202.996] eth0 master offset 3 s0 freq +1186 Mar 18 08:17:31 gm2 ts2phc: [260601.984] eth0 master offset 1000000002 s0 freq +1196 Mar 18 08:17:32 gm2 ts2phc: [260602.992] eth0 master offset -2 s0 freq +1196 Mar 18 10:01:08 gm2 ts2phc: [266819.024] eth0 master offset 1000000000 s0 freq +1196 Mar 18 10:01:09 gm2 ts2phc: [266820.032] eth0 master offset -3 s0 freq +1196 Mar 18 11:38:00 gm2 ts2phc: [272630.984] eth0 master offset 999999994 s0 freq +1217 Mar 18 11:38:01 gm2 ts2phc: [272631.992] eth0 master offset -15 s0 freq +1217

I tried to identify what was happening when these situation happened with this command.

root@gm2:/home/kondou# egrep 'extts|poll returns zero|ts2phc.*s0|nmea delay' /var/log/syslog | less

Here is the related logs for the last 2 lines in the above.

Mar 18 11:37:58 gm2 ts2phc: [272629.220] nmea delay: 308031218 ns Mar 18 11:37:58 gm2 ts2phc: [272629.220] eth0 extts index 0 at 1679107115.000000003 corr 0 src 679107115.930324473 diff 3 Mar 18 11:37:59 gm2 ts2phc: [272630.228] nmea delay: 303697621 ns Mar 18 11:37:59 gm2 ts2phc: [272630.228] eth0 extts index 0 at 1679107116.000000003 corr 0 src 1679107116.942360597 diff 3 Mar 18 11:38:00 gm2 ts2phc: [272630.984] nmea delay: 523486561 ns Mar 18 11:38:00 gm2 ts2phc: [272630.984] eth0 extts index 0 at 1679107116.999999994 corr 0 src 1679107116.478568761 diff 999999994 Mar 18 11:38:00 gm2 ts2phc: [272630.984] eth0 master offset 999999994 s0 freq +1217 Mar 18 11:38:01 gm2 ts2phc: [272631.992] nmea delay: 369493444 ns Mar 18 11:38:01 gm2 ts2phc: [272631.992] eth0 extts index 0 at 1679107117.999999985 corr 0 src 1679107118.640551973 diff -15 Mar 18 11:38:01 gm2 ts2phc: [272631.992] eth0 master offset -15 s0 freq +1217 Mar 18 11:38:02 gm2 ts2phc: [272633.000] nmea delay: 246871044 ns Mar 18 11:38:02 gm2 ts2phc: [272633.000] eth0 extts index 0 at 1679107118.999999976 corr 0 src 1679107119.771175602 diff -24 Mar 18 11:38:03 gm2 ts2phc: [272634.008] nmea delay: 242543557 ns Mar 18 11:38:03 gm2 ts2phc: [272634.008] eth0 extts index 0 at 1679107119.999999978 corr 0 src 1679107120.783509313 diff -22 Mar 18 11:38:04 gm2 ts2phc: [272635.016] nmea delay: 232726656 ns Mar 18 11:38:04 gm2 ts2phc: [272635.016] eth0 extts index 0 at 1679107120.999999993 corr 0 src 1679107121.801321605 diff -7

I noticed src time for the 3rd one (1679107116.478568761) comes later at 2nd one (1679107116.942360597), and all of unlocked situation happened just after this kind of reversed ordered time.

This time information is obtained via read() from /dev/ptp0, and do I have to investigate at kernel side? Seeing from the log, it should have been okay, if event.t.sec for the 3rd one was 1679107117, not 1679107116...

jclark commented 1 year ago

I haven't tested an M8F yet, but what I have observed with an M8T is that it sometimes generates an extra spurious pulse (or at least the kernel delivers a spurious pulse). Unfortunately ts2phc doesn't do any sanity checking on the pulses it gets, so if it gets an extra pulse mid-second it will treat that as being the next top-of-second pulse, which causes havoc.

I haven't investigated whether the root cause is

I suggest using bpftrace to see what pulses you are getting. You can start with this script:

https://github.com/jclark/rpi-cm4-ptp-guide/blob/main/files/ptp.bt

I have rather given up on ts2phc for this use case. I am working on a more robust and functional alternative. It's far from complete, but it does work. If you would be interested in testing it, please let me know. (It's open source, but I haven't made the repo public yet.)

jclark commented 1 year ago

How often are you seeing loss of lock?

kondou commented 1 year ago

I suggest using bpftrace to see what pulses you are getting. You can start with this script:

https://github.com/jclark/rpi-cm4-ptp-guide/blob/main/files/ptp.bt

Okay, I will try, but it requires kernel headers, which is trivial thing.

I have rather given up on ts2phc for this use case. I am working on a more robust and functional alternative. It's far from complete, but it does work. If you would be interested in testing it, please let me know. (It's open source, but I haven't made the repo public yet.)

I think it'd be a better approach also for me. I'd also like to try this one. By the way, I've already wrote quick workaround in ts2phc_slave.c. But this came from the experience in my environment, and it just helps part of the cases...

--- ts2phc_slave.c.orig 2020-09-26 01:15:43.000000000 +0900 +++ ts2phc_slave.c 2023-03-18 21:03:30.887205138 +0900 @@ -304,6 +304,7 @@ struct timespec source_ts = src.ts; struct ptp_extts_event event; uint64_t event_ns, source_ns; + static __kernel_old_time_t last_source_ts_tv_sec = 0LL; int cnt; ` cnt = read(slave->fd, &event, sizeof(event)); @@ -328,9 +329,10 @@ return EXTTS_IGNORE; } - if (source_ts.tv_nsec > 500000000) { + if ((source_ts.tv_nsec > 500000000) || ((last_source_ts_tv_sec != 0) && (source_ts.tv_sec != 0) && (last_source_ts_tv_sec == source_ts.tv_sec))) { source_ts.tv_sec++; } + last_source_ts_tv_sec = source_ts.tv_sec; source_ns = source_ts.tv_sec NS_PER_SEC; offset = event_ns + slave->correction - source_ns; *local_ts = event_ns + slave->correction;`

kondou commented 1 year ago

How often are you seeing loss of lock?

I'm not sure if it depends on something. But after two days when started, there was no loss of lock. But after that it happens from 5 to 8 times a day. This was gm3 (which I named the box), and for gm2, it happened almost once a day after two days when started. These two have almost similar antenna location (the best place in my house). Actually, I did factory reset for M8F via u-center. Until then, I changed GNSS so that BeiDou would be used instead of GLONASS. But after started to use BeiDou, the chance of no loss of lock much often. Seeing from u-center, there are much more constellation in my environment, and I reluctantly revert back to the default setting, and the chance of loss of lock was what I wrote earlier in this message.

And I have one more M8F (named as gm1), but I cannot change anything via u-center, and I asked TimeBeat for how to do the factory reset, but they still cannot provide good suggestion. Any write message to M8F looks like ignored, but any data comes from M8F ok, so it looks like working with default setting. Now I'm suspecting serial connection to M8F (TX line) is not working physically ax expected. I'd be able to do more experiment if this one also working completely again. I'm waiting for additional CM4 w/ IO board for this.

jclark commented 1 year ago

Have you configured time mode? With time-mode, you need only a single satellite for a fix. On the M8F, I believe you need to use UBX-CFG-TMODE2. This is unfortunately broken in the latest u-center (22.07). But you can download 22.02 and configure it that way. You then set it to do a survey (try 30 minutes and 20 meters), and see how precise a position you get.

My experience is that I get the most precise results with only GPS enabled. Each GNSS system's time is slightly different, so having multiple GNSS's is not optimal for precision. It might be different if the antenna position is marginal.

kondou commented 1 year ago

Yes, I've once configured the time mode along with survey. But that survey never completed. I didn't have any idea how it would work since this was my first experience with u-center and ublox product. So, I also didn't have any thoughts on the quality of u-center. Your comment is much helpful to me. But I cannot find 22.02 at ublox site. It'd be also helpful to let me know where I can download. I once asked ublox in Japan, but they don't provide any consumer support, and I was told that I need to ask the distributor from who I purchased, which was useless suggestion to me.

kondou commented 1 year ago

I think it'd be a better approach also for me. I'd also like to try this one. By the way, I've already wrote quick workaround in ts2phc_slave.c. But this came from the experience in my environment, and it just helps part of the cases...

So far, this workaround has been helping some of weird situations, but seeing from the related log messages, that tweak should not be exactly the right one, while it avoided loss of lock. This M8F (in gm1) does not accept any commands via serial, while no issue with the received data from M8F.

Mar 19 15:25:18 gm1 ts2phc: [321806.796] nmea delay: 176138364 ns Mar 19 15:25:18 gm1 ts2phc: [321806.796] eth0 extts index 0 at 1679207154.999999996 corr 0 src 1679207155.915000798 diff -4 Mar 19 15:25:18 gm1 ts2phc: [321806.796] eth0 master offset -4 s2 freq -5514 Mar 19 15:25:18 gm1 ts2phc: [321806.848] nmea sentence: GNRMC,062518.00,V,,,,,,,190323,,,N Mar 19 15:25:18 gm1 ptp4l: [321806.890] port 1: master tx announce timeout Mar 19 15:25:18 gm1 ts2phc: [321806.915] nmea sentence: GNGGA,062518.00,,,,,0,03,19.68,,,,,, Mar 19 15:25:18 gm1 ts2phc: [321806.966] nmea sentence: GNGSA,A,1,07,01,,,,,,,,,,,28.55,19.68,20.68 Mar 19 15:25:18 gm1 ts2phc: [321807.016] nmea sentence: GNGSA,A,1,73,,,,,,,,,,,,28.55,19.68,20.68 Mar 19 15:25:18 gm1 ts2phc: [321807.084] nmea sentence: GPGSV,3,1,11,01,72,242,22,03,18,171,14,07,30,232,21,08,49,062, Mar 19 15:25:18 gm1 ts2phc: [321807.151] nmea sentence: GPGSV,3,2,11,10,11,039,13,14,29,315,,16,10,136,08,17,09,281,26 Mar 19 15:25:18 gm1 ts2phc: [321807.218] nmea sentence: GPGSV,3,3,11,21,75,023,,27,19,082,10,30,30,275,19 Mar 19 15:25:18 gm1 ts2phc: [321807.285] nmea sentence: GLGSV,2,1,08,68,12,057,,69,10,105,14,73,28,173,25,74,68,220, Mar 19 15:25:18 gm1 ptp4l: [321807.332] port 1: master sync timeout Mar 19 15:25:18 gm1 ts2phc: [321807.353] nmea sentence: GLGSV,2,2,08,75,38,321,,83,06,048,,84,54,003,,85,35,267,18 Mar 19 15:25:18 gm1 ts2phc: [321807.386] nmea sentence: GNGLL,,,,,062518.00,V,N Mar 19 15:25:18 gm1 ts2phc: [321807.418] nmea sentence: GNZDA,062518.00,19,03,2023,00,00 Mar 19 15:25:19 gm1 ts2phc: [321807.804] nmea: no valid rmc fix Mar 19 15:25:19 gm1 ts2phc: [321807.804] eth0 extts index 0 at 1679207155.999999998 corr 0 src 1679207155.915000798 diff 999999998 Mar 19 15:25:19 gm1 ts2phc: [321807.804] eth0 ignoring invalid master time stamp Mar 19 15:25:19 gm1 ts2phc: [321807.883] nmea sentence: GNRMC,062519.00,A,3537.86924,N,13946.77915,E,0.058,,190323,,,A Mar 19 15:25:19 gm1 ts2phc: [321808.001] nmea sentence: GNGGA,062519.00,3537.86924,N,13946.77915,E,1,04,11.80,26.0,M,39.4,M,, Mar 19 15:25:19 gm1 ts2phc: [321808.051] nmea sentence: GNGSA,A,3,07,01,03,,,,,,,,,,16.77,11.80,11.92 Mar 19 15:25:19 gm1 ts2phc: [321808.102] nmea sentence: GNGSA,A,3,73,,,,,,,,,,,,16.77,11.80,11.92 Mar 19 15:25:19 gm1 ts2phc: [321808.169] nmea sentence: GPGSV,3,1,11,01,72,243,21,03,18,171,20,07,30,232,21,08,49,062, Mar 19 15:25:19 gm1 ts2phc: [321808.236] nmea sentence: GPGSV,3,2,11,10,11,039,12,14,29,315,,16,10,136,08,17,09,281,26 Mar 19 15:25:19 gm1 ts2phc: [321808.304] nmea sentence: GPGSV,3,3,11,21,75,023,,27,19,082,08,30,30,275,19 Mar 19 15:25:19 gm1 ptp4l: [321808.333] port 1: master sync timeout Mar 19 15:25:19 gm1 ts2phc: [321808.371] nmea sentence: GLGSV,2,1,08,68,12,057,,69,10,105,13,73,28,173,25,74,68,220, Mar 19 15:25:19 gm1 ts2phc: [321808.438] nmea sentence: GLGSV,2,2,08,75,38,321,,83,06,048,,84,54,003,,85,35,267,16 Mar 19 15:25:19 gm1 ts2phc: [321808.489] nmea sentence: GNGLL,3537.86924,N,13946.77915,E,062519.00,A,A Mar 19 15:25:19 gm1 ts2phc: [321808.525] nmea sentence: GNZDA,062519.00,19,03,2023,00,00 Mar 19 15:25:20 gm1 ts2phc: [321808.812] nmea delay: 178128146 ns Mar 19 15:25:20 gm1 ts2phc: [321808.812] eth0 extts index 0 at 1679207157.000000008 corr 0 src 1679207157.929324213 diff 8 Mar 19 15:25:20 gm1 ts2phc: [321808.812] eth0 master offset 8 s2 freq -5503 Mar 19 15:25:20 gm1 ts2phc: [321808.844] nmea sentence: GNRMC,062520.00,V,,,,,,,190323,,,N Mar 19 15:25:20 gm1 ptp4l: [321808.890] port 1: master tx announce timeout Mar 19 15:25:20 gm1 ts2phc: [321808.911] nmea sentence: GNGGA,062520.00,,,,,0,04,19.67,,,,,, Mar 19 15:25:20 gm1 ts2phc: [321808.962] nmea sentence: GNGSA,A,1,07,01,03,,,,,,,,,,28.53,19.67,20.67 Mar 19 15:25:20 gm1 ts2phc: [321809.012] nmea sentence: GNGSA,A,1,73,,,,,,,,,,,,28.53,19.67,20.67 Mar 19 15:25:20 gm1 ts2phc: [321809.079] nmea sentence: GPGSV,3,1,11,01,72,243,21,03,18,171,19,07,30,232,21,08,49,062, Mar 19 15:25:20 gm1 ts2phc: [321809.164] nmea sentence: GPGSV,3,2,11,10,11,039,09,14,29,315,,16,10,136,08,17,09,281,26 Mar 19 15:25:20 gm1 ts2phc: [321809.214] nmea sentence: GPGSV,3,3,11,21,75,023,,27,19,082,08,30,30,275,20 Mar 19 15:25:20 gm1 ts2phc: [321809.281] nmea sentence: GLGSV,2,1,08,68,12,057,,69,10,105,11,73,28,173,25,74,68,220, Mar 19 15:25:20 gm1 ptp4l: [321809.333] port 1: master sync timeout Mar 19 15:25:20 gm1 ts2phc: [321809.349] nmea sentence: GLGSV,2,2,08,75,38,321,,83,06,048,,84,54,003,,85,35,267,16 Mar 19 15:25:20 gm1 ts2phc: [321809.382] nmea sentence: GNGLL,,,,,062520.00,V,N Mar 19 15:25:20 gm1 ts2phc: [321809.416] nmea sentence: GNZDA,062520.00,19,03,2023,00,00

jclark commented 1 year ago

You can find old versions of u-center at the bottom of the u-center page (under resources, evaluation software, then deselect current). Link is here: https://content.u-blox.com/sites/default/files/2022-04/u-center-v22.02_0.zip

If the survey doesn't complete, it could be you set too strict a position limit or not enough time. Try something like 30 minutes and 30 meters. If it can't get to 30 meters in 30 minutes, then there's an antenna problem.

The ts2phc code you are tweaking has changed in the current source repo version.

kondou commented 1 year ago

Thank you very much for the link, and I also noticed there are some other versions in the page.

Now I tried to do the survey (30 minutes and 30 meters), and it went to Fixed Mode seen via Time Mode after doing Poll. Lat/Long looks quite accurate by seeing the map. This was done with both GPS and GLONASS enabled. Now, I'm going to do again only with GPS as you suggested.

jclark commented 1 year ago

After it goes into fixed mode, what does u-center report as the fixed position accuracy? This gives some idea of how good your antenna position is.

kondou commented 1 year ago

The first one (GPS/GLONASS) it looks like within 5 meters, while Accuracy u-center told is 15.64m. With GPS only, Accuracy u-center told is 11.42m, but Lat/Long is about 40 meters away. So GPS/GLONASS combination is much better to me. I will try with GPS/BeiDou to see what it looks like.

kondou commented 1 year ago

Done for GPS/BeiDou. It looks more accurate. Red point with Lat/Long is what google map says, and blue star is where it's actually placed. Should I keep this constellation or should I go only with GPS due to the time difference between constellations?

image

jclark commented 1 year ago

Google Maps isn't a reliable way of measuring the accuracy. I would pay more attention to the accuracy that the receiver reports. If you're in Japan, I suggest enabling QZSS (since QZSS uses the same time base as GPS).

To start off with I would use time mode with just GPS (and possibly QZSS). That gives a good simple baseline. I would do another survey, for a bit longer and set a tighter accuracy requirement.

Also in the UBX-CFG-TP5, set it to align to GPS rather than UTC.

Then test with ts2phc, and see how frequently it loses synchronization.

kondou commented 1 year ago

Thank you very much for the suggestion. Yes, I'm Japan, but I've never observed QZSS while the default setting of M8F has QZSS enabled since I started to run the devices.

Okay, I will go only with GPS, and I've already started the survey with longer time and more accuracy.

For UBX-CFG-TP5, the default is GPS, not UTC, hence it should be okay.

And where can I get the latest ts2phc?

jclark commented 1 year ago

The source code repos are listed here: https://linuxptp.sourceforge.net/

kondou commented 1 year ago

Thank you very much. I've downloaded, and replaced with current ts2phc for gm1 which I cannot do the configuration at the moment. I'm going to watch this kind of log messages in syslog to ensure messages without 's0'.

Mar 19 23:17:57 gm1 ts2phc: [350165.728] /dev/ptp0 offset -4 s2 freq -5384

For the rest of M8Fs, I will replace ts2phc after ongoing survey.

kondou commented 1 year ago

For M8F with time mode (surveyed with GPS/GLONASS), I saw these messages (these are not all). So the situation looks not like improved...

Mar 20 00:21:10 gm2 ts2phc: [404820.985] /dev/ptp0 offset 999987400 s0 freq -16273 Mar 20 00:21:11 gm2 ts2phc: [404821.992] /dev/ptp0 offset 1000005032 s0 freq -16273 Mar 20 00:21:25 gm2 ts2phc: [404836.108] /dev/ptp0 offset -1000000856 s0 freq +36 Mar 20 00:21:26 gm2 ts2phc: [404837.116] /dev/ptp0 offset -999999531 s0 freq +36 Mar 20 00:21:41 gm2 ts2phc: [404851.984] /dev/ptp0 offset 1000000145 s0 freq +2582 Mar 20 00:21:42 gm2 ts2phc: [404852.992] /dev/ptp0 offset 999998931 s0 freq +2582 Mar 20 00:21:55 gm2 ts2phc: [404866.096] /dev/ptp0 offset -1000002780 s0 freq -1080 Mar 20 00:21:56 gm2 ts2phc: [404867.104] /dev/ptp0 offset -1000000323 s0 freq -1080 Mar 20 00:22:08 gm2 ts2phc: [404879.960] /dev/ptp0 offset 1000004082 s0 freq +2327 Mar 20 00:22:09 gm2 ts2phc: [404880.968] /dev/ptp0 offset 1000003115 s0 freq +2327 Mar 20 00:22:19 gm2 ts2phc: [404890.044] /dev/ptp0 offset -999981843 s0 freq +55217 Mar 20 00:22:20 gm2 ts2phc: [404891.052] /dev/ptp0 offset -1000035708 s0 freq +55217 Mar 20 00:22:30 gm2 ts2phc: [404901.896] /dev/ptp0 offset 999999181 s0 freq -15878 Mar 20 00:22:31 gm2 ts2phc: [404902.904] /dev/ptp0 offset 1000016411 s0 freq -15878 Mar 20 00:22:37 gm2 ts2phc: [404908.953] /dev/ptp0 offset -999635529 s0 freq +219372 Mar 20 00:22:38 gm2 ts2phc: [404909.960] /dev/ptp0 offset -999853549 s0 freq +219372 Mar 20 00:22:45 gm2 ts2phc: [404916.764] /dev/ptp0 offset 1000185904 s0 freq +394093 Mar 20 00:22:46 gm2 ts2phc: [404917.773] /dev/ptp0 offset 999793146 s0 freq +394093 Mar 20 00:22:49 gm2 ts2phc: [404920.796] /dev/ptp0 offset -999785434 s0 freq -2092043 Mar 20 00:22:50 gm2 ts2phc: [404921.804] /dev/ptp0 offset -997692052 s0 freq -2092043 Mar 20 00:22:54 gm2 ts2phc: [404925.584] /dev/ptp0 offset 999086442 s0 freq -328457 Mar 20 00:22:55 gm2 ts2phc: [404926.593] /dev/ptp0 offset 999416235 s0 freq -328457 Mar 20 00:23:01 gm2 ts2phc: [404932.640] /dev/ptp0 offset -999981915 s0 freq -77717 Mar 20 00:23:02 gm2 ts2phc: [404933.648] /dev/ptp0 offset -999902862 s0 freq -77717 Mar 20 00:23:19 gm2 ts2phc: [404950.540] /dev/ptp0 offset 999999698 s0 freq +1227 Mar 20 00:23:20 gm2 ts2phc: [404951.549] /dev/ptp0 offset 999999799 s0 freq +1227 Mar 20 00:23:32 gm2 ts2phc: [404963.648] /dev/ptp0 offset -1000004559 s0 freq +275 Mar 20 00:23:33 gm2 ts2phc: [404964.657] /dev/ptp0 offset -1000003514 s0 freq +275