ve3wmb / OrionWspr

Arduino WSPR Beacon for Amateur Radio pico-Balloons using SI5351a
GNU General Public License v3.0
16 stars 5 forks source link

Degradation of GPS signal during calibration results in skipped Transmit cycles #5

Open ve3wmb opened 4 years ago

ve3wmb commented 4 years ago

Note below that during calibration, each iteration should take 10 seconds, with the log trailing by about 1 second, so timestamps for each iteration should be about 11 seconds apart. Note the frequency of the iteration logged at 2:36:9 that I have put in bold font below : It is 73599940 !! Which I can't explain. Most importantly if you look at the timestamps, that iteration took 24 seconds ! You'll note that something similar happens a few more times during this calibration cycle (highlighted in bold). The cumulative effect is that the total time needed for the whole calibration cycle is 5 minutes and 12 seconds which is way too long.

If you note the timestamp on the SWERR that I highlighted in green bold ... it is at 2:40:1. What this is saying is that we completed calibration and are waiting for a TELEMETRY_TIME_EV (value 4) but instead we are getting a PRIMARY_WSPR_TX_TIME_EV (value 6) which is what you would normally expect to see from the scheduler at 40 minutes after the hour. We expect this sort of scenario to happen on startup, as the startup time is asynchronous and the scheduler is synchronous. Once we have completed a full transmit time cycle the two should be in sync.

In this case what has happened is that the overrun on calibration has broken that synchronization for the current TX cycle. The approach of the state machine is to simply ignore events that it is not expecting so in this case we get stuck in WAIT_TELEMETRY_ST (value 3) till the next time around when we see a TELEMETRY_TIME_EV to unstick us. That causes us to skip both transmissions for that TX cycle.

2019-10-14 2:33:53 running CALIBRATION for 4 minutes iteration: 1 overflowCounter: 488 TCNT1: 18431 2019-10-14 2:34:15 Sampled Freq Hz x 100 : 0319999990 Old Corr factor : -8930 New Corr factor : -8940 iteration: 2 overflowCounter: 488 TCNT1: 18434 2019-10-14 2:34:26 Sampled Freq Hz x 100 : 0320000020 Old Corr factor : -8940 New Corr factor : -8930 iteration: 3 overflowCounter: 488 TCNT1: 18432 2019-10-14 2:34:37 Sampled Freq Hz x 100 : 0320000000 Old Corr factor : -8930 New Corr factor : -8930 iteration: 4 overflowCounter: 488 TCNT1: 18432 2019-10-14 2:34:48 Sampled Freq Hz x 100 : 0320000000 Old Corr factor : -8930 New Corr factor : -8930 iteration: 5 overflowCounter: 488 TCNT1: 18432 2019-10-14 2:34:59 Sampled Freq Hz x 100 : 0320000000 Old Corr factor : -8930 New Corr factor : -8930 iteration: 6 overflowCounter: 537 TCNT1: 7168 2019-10-14 2:35:11 Sampled Freq Hz x 100 : 0352000000 Old Corr factor : -8930 New Corr factor : -8920 iteration: 7 overflowCounter: 488 TCNT1: 18431 2019-10-14 2:35:22 Sampled Freq Hz x 100 : 0319999990 Old Corr factor : -8920 New Corr factor : -8930 iteration: 8 overflowCounter: 488 TCNT1: 18433 2019-10-14 2:35:33 Sampled Freq Hz x 100 : 0320000010 Old Corr factor : -8930 New Corr factor : -8920 iteration: 9 overflowCounter: 488 TCNT1: 18432 2019-10-14 2:35:45 Sampled Freq Hz x 100 : 0320000000 Old Corr factor : -8920 New Corr factor : -8920 iteration: 10 overflowCounter: 1123 TCNT1: 3066 2019-10-14 2:36:9 Sampled Freq Hz x 100 : 0735999940 Old Corr factor : -8920 New Corr factor : -8910 iteration: 11 overflowCounter: 488 TCNT1: 18429 2019-10-14 2:36:25 Sampled Freq Hz x 100 : 0319999970 Old Corr factor : -8910 New Corr factor : -8920 iteration: 12 overflowCounter: 488 TCNT1: 18431 2019-10-14 2:36:36 Sampled Freq Hz x 100 : 0319999990 Old Corr factor : -8920 New Corr factor : -8930 iteration: 13 overflowCounter: 781 TCNT1: 16383 2019-10-14 2:36:53 Sampled Freq Hz x 100 : 0511999990 Old Corr factor : -8930 New Corr factor : -8920 iteration: 14 overflowCounter: 781 TCNT1: 16382 2019-10-14 2:37:10 Sampled Freq Hz x 100 : 0511999980 Old Corr factor : -8920 New Corr factor : -8910 iteration: 15 overflowCounter: 683 TCNT1: -26628 2019-10-14 2:37:25 Sampled Freq Hz x 100 : 0447344600 Old Corr factor : -8910 New Corr factor : -8900 iteration: 16 overflowCounter: 732 TCNT1: 27646 2019-10-14 2:37:41 Sampled Freq Hz x 100 : 0479999980 Old Corr factor : -8900 New Corr factor : -8890 iteration: 17 overflowCounter: 488 TCNT1: 18432 2019-10-14 2:37:57 Sampled Freq Hz x 100 : 0320000000 Old Corr factor : -8890 New Corr factor : -8890 iteration: 18 overflowCounter: 537 TCNT1: 7167 2019-10-14 2:38:9 Sampled Freq Hz x 100 : 0351999990 Old Corr factor : -8890 New Corr factor : -8880 iteration: 19 overflowCounter: 488 TCNT1: 18431 2019-10-14 2:38:20 Sampled Freq Hz x 100 : 0319999990 Old Corr factor : -8880 New Corr factor : -8890 iteration: 20 overflowCounter: 488 TCNT1: 18430 2019-10-14 2:38:31 Sampled Freq Hz x 100 : 0319999980 Old Corr factor : -8890 New Corr factor : -8900 iteration: 21 overflowCounter: 537 TCNT1: 7168 2019-10-14 2:38:43 Sampled Freq Hz x 100 : 0352000000 Old Corr factor : -8900 New Corr factor : -8890 iteration: 22 overflowCounter: 488 TCNT1: 18431 2019-10-14 2:38:54 Sampled Freq Hz x 100 : 0319999990 Old Corr factor : -8890 New Corr factor : -8900 iteration: 23 overflowCounter: 488 TCNT1: 18431 2019-10-14 2:39:5 Sampled Freq Hz x 100 : 0319999990 Old Corr factor : -8900 New Corr factor : -8910 2019-10-14 2:39:5 >> orion PRE sm trace: curr_state: 2 curr_event: 3 2019-10-14 2:39:5 << orion POST sm trace: curr_state: 3 event_just_processed: 3 action: 0 2019-10-14 2:40:1 >> orion PRE sm trace: curr_state: 3 curr_event: 6 2019-10-14 2:40:1 ***SWERR: 2 data dump in hex: 6

Because this is working most of the time, my suspicion is that this is due to missing PPS pulses.

The ISR simply counts the number of PPS pulses it gets, stops at 10 and then tallies the value in Timer1/Counter1. If we lose some pulses then the frequency counter will continue to run until we have seen 10 PPS pulses so that could explain the much higher frequency count values (double the time means double the count).

After some more consideration I think that the guard timer on Calibration would solve this problem.

The issue at hand is missing transmissions, due to calibration running over its allotted time and throwing off synchronization with the scheduler, causing lost events.

This is a side effect of something which we have little control over, which is the momentary loss of PPS due to degradation of the GPS signal (we think) during a calibration cycle.

Having a few frequency measurements that are out of wack isn’t a big problem as the most damage we can do is to erroneously change our frequency by + 2.3 Hz (correction step size is 0.1 Hz with 23 samples). What I have typically seen is that 3 or 4 measurements are off due to missing PPS pulses so this error would be more like < +0.5 Hz.

That in itself isn’t going to cause a big issue. What is the bigger problem is taking more than the allotted 4 or so minutes and causing us to miss the next event which results in holes in the transmission schedule.

A virtual guard timer (using Chrono.h) would start on reception of PPS during calibration and be cancelled when calibration is completed. We would check for expiry before each calibration iteration and fail calibration if the value was exceeded. This would ensure that we don’t lose any events and impact transmission.

Note that we have agreed that this issue does not gate Release 1. - VE3WMB