lorabasics / basicstation

LoRa Basics™ Station - The LoRaWAN Gateway Software
https://doc.sm.tc/station
Other
358 stars 183 forks source link

Repeated excessive clock drifts between MCU/SX1301#0 RAK833 #63

Closed jawadiot closed 2 years ago

jawadiot commented 4 years ago

Hi,

I'm running basic station on raspberry compute 3, with concentrator RAKWIRLESS 831, i sucess to run the example live-s2.sm.tc , but the concentrator get error :

2020-05-14 22:19:31.086 [SYN:ERRO] Repeated excessive clock drifts between MCU/SX1301#0 (3 retries): -2978.4ppm (threshold 100.0ppm) 2020-05-14 22:19:34.237 [SYN:ERRO] Repeated excessive clock drifts between MCU/SX1301#0 (6 retries): -2751.8ppm (threshold 100.0ppm) 2020-05-14 22:19:37.387 [SYN:ERRO] Repeated excessive clock drifts between MCU/SX1301#0 (9 retries): -2552.3ppm (threshold 100.0ppm) 2020-05-14 22:19:40.538 [SYN:ERRO] Repeated excessive clock drifts between MCU/SX1301#0 (12 retries): -2377.3ppm (threshold 100.0ppm) 2020-05-14 22:19:43.688 [SYN:ERRO] Repeated excessive clock drifts between MCU/SX1301#0 (15 retries): -2223.0ppm (threshold 100.0ppm) 2020-05-14 22:19:46.839 [SYN:ERRO] Repeated excessive clock drifts between MCU/SX1301#0 (18 retries): -2086.5ppm (threshold 100.0ppm) 2020-05-14 22:19:48.939 [SYN:INFO] MCU/SX1301 drift stats: min: -2004.5ppm q50: -2491.4ppm q80: -2899.4ppm max: -3145.6ppm - threshold q90: -3060.2ppm 2020-05-14 22:19:48.939 [SYN:INFO] Avg MCU drift vs SX1301#0: 1.0ppm

I saw that on the RAKWIRELESS forum, he advises to do that :

Need to change the spi rate in basicstation/deps/lgw/platform-xxx/libloragw/src/loragw_spi.native.c from 8000000 to 2000000.

I do this change in /opt/basicstation/deps/lgw/platform-rpi/libloragw/src/loragw_spi.native.c, but what i'm suppose to do, to apply this change, i think i should to recompile the sofwtware that use spi driver no ?

yucheng1993 commented 4 years ago

Hi, have you solved your problem?

craigpeacock commented 3 years ago

jawadiot makes reference to a RAKWireless forum. Here is the link: https://forum.rakwireless.com/t/basicstation-on-diy-gateway/62/27

I'm not convinced the SPI clock speed is the source of the "Repeated excessive clock drifts...". RAKWireless Staff indicate the SPI clock speed was reduced as "There is a high probability that high spi rate will cause sx1301 to fail to start." I do, however, note jawadiot's drifts are quite excessive.

I'm getting a similar 'error' message and reducing the SPI clock has not mitigated it. I also note in the above forum, others have the same experience. However my drifts are only a couple ppm: [SYN:ERRO] Repeated excessive clock drifts between MCU/SX130X#0 (3 retries): 9.5ppm (threshold 5.0ppm) [SYN:ERRO] Repeated excessive clock drifts between MCU/SX130X#0 (6 retries): 5.3ppm (threshold 5.0ppm) [SYN:ERRO] Repeated excessive clock drifts between MCU/SX130X#0 (3 retries): 7.7ppm (threshold 4.9ppm) [SYN:ERRO] Repeated excessive clock drifts between MCU/SX130X#0 (6 retries): 6.1ppm (threshold 4.9ppm)

I don't know if it has anything to do with timesync quality (https://github.com/lorabasics/basicstation/issues/41#issuecomment-500751882) and if it is significant or not.

fk0815 commented 3 years ago

I have the same issue with RAK833. It seems that in my setup this problem is reproducible when I touch the metal part of the antenna connector or when I touch the PPS connector, could be some EMC sensitivity. A restart of basicstation clears the problem and a resync is done. As a workaround I patched:

diff --git a/src/timesync.c b/src/timesync.c
index 0000216..90d7c7f 100644
--- a/src/timesync.c
+++ b/src/timesync.c
@@ -231,6 +231,10 @@ ustime_t ts_updateTimesync (u1_t txunit, int quality, const timesync_t* curr) {
         }
         if( stats->excessive_drift_cnt >= 2*QUICK_RETRIES )
             stats->drift_thres = MAX_MCU_DRIFT_THRES;  // reset - we might be stuck on a very low value
+        if( stats->excessive_drift_cnt >= 20*QUICK_RETRIES ) {
+            LOG(MOD_SYN|CRITICAL, "excessive_drift_cnt too high! Concentrator hangup? Exit now.");
+            exit(EXIT_FAILURE);
+        }
         return TIMESYNC_RADIO_INTV/2;
     }
     stats->excessive_drift_cnt = 0;
-- 

This exits the application and it is then restarted by systemd automatically.

tonysmith55 commented 3 years ago

You may want to try this. I locked the VPU frequency on a Pi3 by adding the line “core_freq=250” to /boot/config.txt I am assuming the Compute3 module has exactly the same issue. The explanation behind this can be found at https://www.thethingsnetwork.org/forum/t/rp3-x-ic880a-gateway-stopped-working/15469/12 Reducing the SPI speed from 8Mhz to 2MHz is an issue in the RAK2245 module due to the SPI drivers being frequency limited but I've not experienced this on a other RAK devices. (Reference https://www.thethingsnetwork.org/forum/t/do-you-need-the-loraserver-os-for-the-rak831/25860/9)

orvio-craig commented 3 years ago

I'm also seeing this problem, usually resulting in a disconnection. I'm using an outdoor antenna with my RAK2287, whcih I don't believe needs any modification to SPI speed. It connects fine, but disconnects after about 3 minutes. I've tried adding the line “core_freq=250” to /boot/config.tx but haven't seen any difference in behaviour...

2021-10-08 12:42:04.469 [S2E:VERB]   TX power: 0.0 dBm EIRP
2021-10-08 12:42:04.469 [S2E:VERB]   JoinEUI list: 0 entries
2021-10-08 12:42:04.469 [S2E:VERB]   NetID filter: FFFFFFFF-FFFFFFFF-FFFFFFFF-FFFFFFFF
2021-10-08 12:42:04.469 [S2E:VERB]   Dev/test settings: nocca=0 nodc=0 nodwell=0
2021-10-08 12:42:46.481 [SYN:INFO] MCU/SX130X drift stats: min: -3.3ppm  q50: +6.7ppm  q80: +8.1ppm  max: +10.5ppm - threshold q90: +9.5ppm
2021-10-08 12:42:46.481 [SYN:INFO] Mean MCU drift vs SX130X#0: 6.2ppm
2021-10-08 12:43:04.333 [SYN:INFO] Time sync qualities: min=184 q90=212 max=240 (previous q90=2147483647)
2021-10-08 12:43:21.138 [SYN:VERB] Time sync rejected: quality=213 threshold=212
2021-10-08 12:43:29.545 [SYN:VERB] Time sync rejected: quality=6321 threshold=212
2021-10-08 12:43:31.645 [SYN:INFO] MCU/SX130X drift stats: min: -2.4ppm  q50: +5.2ppm  q80: +7.6ppm  max: +8.6ppm - threshold q90: +8.1ppm
2021-10-08 12:43:31.645 [SYN:INFO] Mean MCU drift vs SX130X#0: 5.0ppm
2021-10-08 12:43:38.996 [SYN:VERB] Time sync rejected: quality=213 threshold=212
2021-10-08 12:43:41.097 [SYN:VERB] Time sync rejected: quality=213 threshold=212
2021-10-08 12:43:55.799 [SYN:VERB] Time sync rejected: quality=213 threshold=212
2021-10-08 12:44:06.301 [SYN:INFO] Time sync qualities: min=167 q90=213 max=6321 (previous q90=212)
2021-10-08 12:44:16.802 [SYN:INFO] MCU/SX130X drift stats: min: +1.9ppm  q50: +5.2ppm  q80: +9.5ppm  max: -95.7ppm - threshold q90: -90.1ppm
2021-10-08 12:44:16.802 [SYN:INFO] Mean MCU drift vs SX130X#0: -9.1ppm
2021-10-08 12:44:16.803 [SYN:ERRO] Repeated excessive clock drifts between MCU/SX130X#0 (3 retries): -95.7ppm (threshold 90.1ppm)
2021-10-08 12:44:19.954 [SYN:ERRO] Repeated excessive clock drifts between MCU/SX130X#0 (6 retries): -100.4ppm (threshold 90.1ppm)
2021-10-08 12:44:23.105 [SYN:ERRO] Repeated excessive clock drifts between MCU/SX130X#0 (9 retries): -100.6ppm (threshold 100.0ppm)
2021-10-08 12:44:26.260 [SYN:ERRO] Repeated excessive clock drifts between MCU/SX130X#0 (12 retries): -101.0ppm (threshold 100.0ppm)
2021-10-08 12:44:29.411 [SYN:ERRO] Repeated excessive clock drifts between MCU/SX130X#0 (15 retries): -100.5ppm (threshold 100.0ppm)
2021-10-08 12:44:35.713 [SYN:VERB] Time sync rejected: quality=239 threshold=213
2021-10-08 12:44:46.214 [SYN:VERB] Time sync rejected: quality=318 threshold=213
2021-10-08 12:44:48.315 [SYN:INFO] MCU/SX130X drift stats: min: -86.2ppm  q50: -100.4ppm  q80: -100.8ppm  max: -101.0ppm - threshold q90: -101.0ppm
2021-10-08 12:44:48.315 [SYN:INFO] Mean MCU drift vs SX130X#0: -97.4ppm
2021-10-08 12:44:52.520 [SYN:INFO] Time sync qualities: min=182 q90=212 max=318 (previous q90=213)
2021-10-08 12:45:01.323 [AIO:DEBU] [3] Connection closed unexpectedly
2021-10-08 12:45:01.323 [AIO:DEBU] [3] WS connection shutdown...
2021-10-08 12:45:01.323 [TCE:VERB] Connection to MUXS closed in state 4
2021-10-08 12:45:01.324 [TCE:INFO] MUXS reconnect backoff 1s (retry 0)
tonysmith55 commented 3 years ago

I haven't checked the Basics Station code to be certain if GPS is involved in this. I would check if the GPS serial connection is communicating, you may see this in the logs when Basics Station starts.
With the introduction of BlueTooth onto the Pi the serial port change swapped between from /dev/ttyAMA0 to /dev/ttyS0. From memory i think you can continue to use ttyAMA0 if you disable BlueTooth in /boot/config.txt by adding dtoverlay=disable-bt For more detail have a look at https://raspberrypi.stackexchange.com/questions/45570/how-do-i-make-serial-work-on-the-raspberry-pi3-pizerow-pi4-or-later-models/45571#45571 Not sure if this is the issue, but it's where I would start.

orvio-craig commented 3 years ago

I've tried what you suggest, but I don't think it had much of an impact at all. I was able to view the output and see that it had got a GPS fix. Even without that the PPS signal should still come through to the SX1302 so I'd expect it shouldn't really make much differerence to timing whether or not the serial port is active?

orvio-craig commented 3 years ago

Turns out I just had to add a "pps": true to the station.conf file!