nordic-auko / nRF5-ble-timesync-demo

nRF52 clock synchronization demo code
55 stars 50 forks source link

Issues with m_master_counter #7

Closed MegaAdragon closed 3 years ago

MegaAdragon commented 4 years ago

Hi,

I'm trying to synchronize an event on multiple nRF52 boards. This code is a really nice starting point, however I had a little bit of trouble with the counter synchronization.

This is my setup: I'm using the m_params.high_freq_timer[1] to synchronize the blinking of an LED. The master will toggle the LED every 100 counter ticks (250ms). Now I want multiple synchronized slaves that toggle their LED at exactly the same moment. After modifying the code, I ran into a few strange behaviors / problems:

1.

https://github.com/nordic-auko/nRF5-ble-timesync-demo/blob/e06a29486a4965277b3fb09c7b62708dfc73abf4/nRF5_SDK_16.0.0_98a08e2/examples/common/time_sync.c#L539

This should be: m_master_counter = mp_curr_adj_pkt->counter_val + 1 Because the timer will be reset during the synchronization and the whole reset will take one additional overflow in the master, the counter value needs be increased by 1 for every synchronization.

2.

This one was pretty hard to find and I'm not sure if I my workaround is the best way to resolve it... https://github.com/nordic-auko/nRF5-ble-timesync-demo/blob/e06a29486a4965277b3fb09c7b62708dfc73abf4/nRF5_SDK_16.0.0_98a08e2/examples/common/time_sync.c#L583

If the timer_offset is very small (let's assume timer_offset = 1), it looks like the timer reset does not work reliable. What I would expect: m_params.high_freq_timer[0]->CC[2] = 39999

  1. CC[2] is reached -> reset timer
  2. CC[0] does not occur, because TIME_SYNC_TIMER_MAX_VAL > 39999

What actually happens:

  1. CC[2] is reached -> reset timer
  2. CC[0] occurs around 125 ns later (looks like 2 16Mhz timer steps)

I suspect that this has something to do with a delay in the PPI tasks and the timer itself. I found this information in the infocenter:

After the TIMER is started, the CLEAR task, COUNT task and the STOP task will guarantee to take effect within one clock cycle of the PCLK16M.

This means that if the timer_offset is too small (which happens quite often), the counter value (m_params.high_freq_timer[1]) in the slave is not reset to 0, but to 1 (because CC[0] increases the timer value during reset).

My workaround looks like this: if (timer_offset < 5) { return false; }

3.

https://github.com/nordic-auko/nRF5-ble-timesync-demo/blob/e06a29486a4965277b3fb09c7b62708dfc73abf4/nRF5_SDK_16.0.0_98a08e2/examples/common/time_sync.c#L559

If this value is already greater than TIME_SYNC_TIMER_MAX_VAL or the time it takes from this line of code to actually setting the CC[2] and activating the PPI group will cause it to become greater than TIME_SYNC_TIMER_MAX_VAL , the counter in the master will be increased and this is no longer valid: https://github.com/nordic-auko/nRF5-ble-timesync-demo/blob/e06a29486a4965277b3fb09c7b62708dfc73abf4/nRF5_SDK_16.0.0_98a08e2/examples/common/time_sync.c#L539

Once again, the result will be a wrong m_master_counter in the slave.

In order to prevent this I added the following workaround: if (peer_timer > (TIME_SYNC_TIMER_MAX_VAL - 10000)) { return false; }

This means that quite a few incoming synchronization requests will be ignored, but because I'm not sure how much time it will take until the following line is called, I tried to add a relativly large "safety" margin. https://github.com/nordic-auko/nRF5-ble-timesync-demo/blob/e06a29486a4965277b3fb09c7b62708dfc73abf4/nRF5_SDK_16.0.0_98a08e2/examples/common/time_sync.c#L597


Because the synchronization itself and the trigger for the LED toggle event are completly asynchronous, the synchronization between master and slave looked completly random. Sometimes the slave would trigger one counter tick too early, sometimes one tick too late, or sometimes it was correct.

After fixing (workaround) all three issues, I tested with an oscilloscope and the synchronization seemed to work perfectly. Is there any more insight you can give me on these issues? Are there any proposals for better solutions than my workarounds? I tested all of this with an oscilloscope, however, I forgot to take some screenshots. I you like, I can reproduce the described issues and upload the screenshots.

Thank you in advance!

nordic-auko commented 4 years ago

Hello, thanks for your feedback! I've started looking at the details, but not considered all the information yet.

I do have one question regarding your test setup: Did you adjust TIME_SYNC_TIMER_MAX_VAL such that the high_freq_timer[0] gives you the target LED toggling rate directly, or are you using the timestamp function to synchronize the toggling?

Edit: After re-reading the first paragraph, I see the reference to high_freq_timer[1]. Can you elaborate a bit how you've configured the LED toggling relative to this timer?

MegaAdragon commented 4 years ago

Did you adjust TIME_SYNC_TIMER_MAX_VAL such that the high_freq_timer[0] gives you the target LED toggling rate directly, or are you using the timestamp function to synchronize the toggling?

Code is a little bit chaotic at the moment...

I modified the usage of the debug pin:

    nrf_ppi_channel_and_fork_endpoint_setup(
        NRF_PPI_CHANNEL0,
        (uint32_t) nrf_timer_event_address_get(NRF_TIMER3, NRF_TIMER_EVENT_COMPARE4),
        nrf_gpiote_task_addr_get(NRF_GPIOTE_TASKS_OUT_3),
        (uint32_t) nrf_egu_task_trigger_address_get(NRF_EGU3, 2));
    nrf_ppi_channel_enable(NRF_PPI_CHANNEL0);

EDIT: In my setup the timers are reversed:

    ts_params.high_freq_timer[0] = NRF_TIMER2;
    ts_params.high_freq_timer[1] = NRF_TIMER3;

My EGU interrupt looks like this:

void SWI3_EGU3_IRQHandler(void)
{
    if (NRF_EGU3->EVENTS_TRIGGERED[0] != 0) // both timers have been reset -> CC[2]
    {
        NRF_EGU3->EVENTS_TRIGGERED[0] = 0;

        m_master_counter = mp_curr_adj_pkt->counter_val + 1;

        // calculate next toggle
    magic_offset = mp_curr_adj_pkt->magic_offset;
    m_params.high_freq_timer[1]->CC[4] = magic_offset - m_master_counter % magic_offset;

        m_timer_update_in_progress = false;

        nrf_balloc_free(&m_sync_pkt_pool, (void*)mp_curr_adj_pkt);
    }
    else if (NRF_EGU3->EVENTS_TRIGGERED[2] != 0)
    {
        NRF_EGU3->EVENTS_TRIGGERED[2] = 0;
    m_params.high_freq_timer[1]->CC[4] += magic_offset;
    }
}

magic_offset is a static variable which needs to have the same value for master and slave. If I use static uint32_t magic_offset = 100;, the LED on the master will toggle every 100 counter ticks (250ms). During an active synchronization, the slave will use the m_master_counter to calculate the next CC[4] value in order for both to toggle at the same time.

nordic-auko commented 4 years ago

Thanks for the details, I do have some comments here:

Regarding m_params.high_freq_timer[1], this is intended to keep track of the m_params.high_freq_timer[0] wraparound count. Keeping track of the wraparounds is required when calculating timestamps. The difficulty here is making sure the wraparound counter is aligned between all devices. Ideally m_master_counter would always reflect the global timing state exactly, but because it is not known how long it will be until the next timing packet is received on the slave side, the local timer[1] is used to keep track in the meanwhile. The corner case where the master timer wraps around before the slave state is updated is not handled properly at the moment. The effect is that you will see jitter of TIME_SYNC_TIMER_MAX_VAL / 16000000 s = 2.5 ms once in a while when using the timestamp function. The "amplitude" of this jitter can be reduced by reducing the TIME_SYNC_TIMER_MAX_VAL value.

I've tested your workarounds, and I see that they do appear to significantly reduce occurrences of jitter in the timestamp function.

The downside of this improvement is that the mean accuracy of the high_freq_timer[0] compared to the peer is worsened. Especially workaround 2, which lets the clock drift for 4 cycles longer than before.

Regarding the workaround item 3, this effect could also be achieved by having the master transmit packets aligned with the timer wraparound. The peers cannot update their local timers more often than once per cycle anyway. Some additional logic would be required to align the master timer with the radio timeslots.

Generally I would not recommend using m_params.high_freq_timer[1] directly, as you have to be aware of all the inner workings of the time_sync.c code.

If your application only needs the timekeeping to toggle the LED at 250ms, the easiest approach would be to set: #define TIME_SYNC_TIMER_MAX_VAL (4000000) and use the debug pin toggling from main.c without modification. The downside of this is that you reduce the accuracy somewhat, as only 4 adjustments can be made per second. The worst-case clock drift should be single-digit microseconds at this rate.

It is not trivial to set up a "downsampled" timer[0] for your purpose, in the event you want to keep a higher wraparound rate for timer[0]. The easiest approach I can think of is using a separate timer and manually updating the compare value based on the timestamp. With your workarounds, I observe a mean error of ~1 us and standard deviation of <20 us in my test setup with 1 master and 1 slave connected to an oscilloscope, using the code below.


void TIMER4_IRQHandler(void)
{
    if (NRF_TIMER4->EVENTS_COMPARE[0] != 0)
    {
        NRF_TIMER4->EVENTS_COMPARE[0] = 0;
        (void) NRF_TIMER4->EVENTS_COMPARE[0];

        nrf_gpio_pin_toggle(NRF_GPIO_PIN_MAP(0, 24));

        uint32_t next_toggle_ticks; // 1 MHz tick unit (TIMER4)
        uint64_t timestamp; // 16 MHz tick unit
        uint64_t next_toggle_timestamp;  // 16 MHz tick unit

        // Schedule next toggle at next closest point where timestamp % 16000000/4 == 0
        timestamp = ts_timestamp_get_ticks_u64(8);
        next_toggle_timestamp = ((timestamp + (16000000/4 - 1)) / (16000000/4)) * (16000000/4);
        next_toggle_ticks = (uint32_t) ((((next_toggle_timestamp - timestamp))) / 16) + 0;

        // Skip cycle in case of too large deviation
        if (next_toggle_ticks < 10000)
        {
            next_toggle_ticks = 250000 + next_toggle_ticks;
        }

        NRF_TIMER4->CC[0] = next_toggle_ticks;

        // NRF_LOG_INFO("timestamp: %lu", timestamp);
        // NRF_LOG_INFO("next_toggle_timestamp: %lu", next_toggle_timestamp);
        // NRF_LOG_INFO("next_toggle_ticks: %d", next_toggle_ticks);
    }
}

static void led_timer_init(void)
{
    // Configure TIMER4 with 1 MHz clock rate and compare match for 250 ms
    NRF_TIMER4->TASKS_STOP  = 1;
    NRF_TIMER4->TASKS_CLEAR = 1;
    NRF_TIMER4->PRESCALER   = 4;
    NRF_TIMER4->BITMODE     = TIMER_BITMODE_BITMODE_32Bit << TIMER_BITMODE_BITMODE_Pos;
    NRF_TIMER4->CC[0]       = 250000;
    NRF_TIMER4->SHORTS      = TIMER_SHORTS_COMPARE0_CLEAR_Msk;
    NRF_TIMER4->INTENSET    = TIMER_INTENSET_COMPARE0_Msk;

    NVIC_ClearPendingIRQ(TIMER4_IRQn);
    NVIC_SetPriority(TIMER4_IRQn, 7);
    NVIC_EnableIRQ(TIMER4_IRQn);

    NRF_TIMER4->TASKS_START  = 1;
}
MegaAdragon commented 4 years ago

Thank you for your very detailed response! I'm trying to explain my goal a little bit more detailed in order to show what I actually want to achieve:

In my final setup I have one master and multiple slaves. The master has an active BLE connection to every slave. Now I want the master to tell all the slaves to do an action at exactly the same moment (f.e. toggle a pin). In order to achieve this, I'm trying to use your implementation as an underlying mechanism to make sure that the slaves will actually perfom the command at the same time. My plan was to send a timestamp value (based on the master m_params.high_freq_timer[1]) via BLE to all slaves. If the timestamp is far enough in the future (let's say 100ms), this should be more than enough time to make sure that all slaves have received the command and are ready to perfom the action.

I'm trying to extend the time_sync interface to not only be a able to poll a timestamp, but to insert a PPI task that will be executed when a timestamp in the future is reached.


Here are some more comments:

I've tested your workarounds, and I see that they do appear to significantly reduce occurrences of jitter in the timestamp function.

Only reduce or remove the jitter completly? I ran a test for like 30min and I wasn't able to reproduce the jitter...

The easiest approach I can think of is using a separate timer and manually updating the compare value based on the timestamp

I thought about this too, but without my workarounds the timestamp on the slaves is not reliable because the timestamp is a combination of m_params.high_freq_timer[0] and m_master_counter + m_params.high_freq_timer[1]. This I why I created the test case with the 250ms LED toggle... The jitter you can see on the oscilloscope should also be visible in the timestamp. This means that everything that is based on the timestamp will eventually also have this jitter.

Regarding the workaround item 3, this effect could also be achieved by having the master transmit packets aligned with the timer wraparound. The peers cannot update their local timers more often than once per cycle anyway. Some additional logic would be required to align the master timer with the radio timeslots.

I agree... For now I don't see this as a big problem, because the broadcast frequency in the master is quite high (200 Hz) and a few ignored sync packets won't have a big impact on the overall accuracy.

With your workarounds, I observe a mean error of ~1 us and standard deviation of <20 us in my test setup with 1 master and 1 slave connected to an oscilloscope

I have the same setup, but a little bit more accurate results: 192ns mean value with 109ns standard deviation (measured 6k toggles). This is far better than the accuracy I need for my application...

nordic-auko commented 4 years ago

I'm trying to extend the time_sync interface to not only be a able to poll a timestamp, but to insert a PPI task that will be executed when a timestamp in the future is reached.

Aha, now the detailed investigation makes a lot of sense! Such a function seems very useful, and very much in line with the "use PPI/TIMER for everything" approach.

What is your target accuracy for this functionality? I am wondering if it would be sufficient to use timer[1], or potentially a different NRF_TIMER altogether (also in counter mode).

Here are some more comments:

I've tested your workarounds, and I see that they do appear to significantly reduce occurrences of jitter in the timestamp function.

Only reduce or remove the jitter completly? I ran a test for like 30min and I wasn't able to reproduce the jitter...

I still saw the jitter, at maybe a couple of minutes into the test. I used the min/max measurements from the oscilloscope, and this only caught the first occurrence. However, it could be something else causing the remaining jitter. We do have slightly differing test setups after all.

The easiest approach I can think of is using a separate timer and manually updating the compare value based on the timestamp

I thought about this too, but without my workarounds the timestamp on the slaves is not reliable because the timestamp is a combination of m_params.high_freq_timer[0] and m_master_counter + m_params.high_freq_timer[1]. This I why I created the test case with the 250ms LED toggle... The jitter you can see on the oscilloscope should also be visible in the timestamp. This means that everything that is based on the timestamp will eventually also have this jitter.

Is it an option to decrease TIME_SYNC_TIMER_MAX_VAL, for example to 1600? This would reduce the jitter amplitude to 0.1 ms. The counter will overflow in ~119 hours at this rate.

Regarding the workaround item 3, this effect could also be achieved by having the master transmit packets aligned with the timer wraparound. The peers cannot update their local timers more often than once per cycle anyway. Some additional logic would be required to align the master timer with the radio timeslots.

I agree... For now I don't see this as a big problem, because the broadcast frequency in the master is quite high (200 Hz) and a few ignored sync packets won't have a big impact on the overall accuracy.

With your workarounds, I observe a mean error of ~1 us and standard deviation of <20 us in my test setup with 1 master and 1 slave connected to an oscilloscope

I have the same setup, but a little bit more accurate results: 192ns mean value with 109ns standard deviation (measured 6k toggles). This is far better than the accuracy I need for my application...

My oscilloscope did struggle to trigger at the 250ms rate. I had to zoom out a bit to get stable triggering and measurements, possibly at the expense of measurement accuracy.

MegaAdragon commented 4 years ago

My target accuracy is somewhere below 1ms. However, because I have seen that it is possible to get the accuracy below 1 us, I want to achieve at least around 10 us accuracy or something like that... My concern isn't the accuracy but more the reliability of the system. I need to make sure that the slaves will perfom the action at the same time every time and a random jitter kind of makes that impossible.

Is it an option to decrease TIME_SYNC_TIMER_MAX_VAL, for example to 1600? This would reduce the jitter amplitude to 0.1 ms. The counter will overflow in ~119 hours at this rate.

Yes, that would be possible. This way I would be within my accuracy target even if the jitter occurs. I can test this with my setup, but because TX_CHAIN_DELAY compared to TIME_SYNC_TIMER_MAX_VAL is now relativly huge, I would expect that I will have a lot more problems with the corner cases around the overflow in the master timer. I will have to discard a lot more sync pakets or accept a lot more jitter occurences.

My oscilloscope did struggle to trigger at the 250ms rate. I had to zoom out a bit to get stable triggering and measurements, possibly at the expense of measurement accuracy.

I only triggered the oscilloscope on the falling edge. This way I was able to zoom in more, but maybe that's why I didn't see remaining jitter.

MegaAdragon commented 3 years ago

I was busy with a lot of other stuff, but now I had another look into this. I did a few more minor modifications and ran some more tests with my system and it looks very stable. I can reproduce an overall accuracy around 200ns.

Regarding this:

Regarding the workaround item 3, this effect could also be achieved by having the master transmit packets aligned with the timer wraparound. The peers cannot update their local timers more often than once per cycle anyway. Some additional logic would be required to align the master timer with the radio timeslots.

I found no easy way to align the timer with the radio timeslots. In order to avoid too many discarded packets, I reschedule the next transmissions as earliest (instead of normal) whenever the timer is too close to it's overflow value. In my case I have the additional difficulty with the ongoing BLE activity. When the master has 4 active BLE connections (one for each receiver), the transmission frequency drops from 200 Hz to something around 120 Hz.

I also added the possibility for a callback mechanism for my synchronized event. So the API now allows you to add a PPI endpoint and a timer tick in the future. When the timestamp in the future is reached, the PPI endpoint is triggered and the callback will be triggered via EGU interrupt. In my application, the callback contains some status information about the synchronization event (number of received packets, time of last synchronization, etc.). Because I have an active BLE connection to the master, this is really usefull to get a direct feedback and create some statistics (for example to see how many packets where lost). The interface part looks like this:

typedef struct
{
    uint32_t startTick;
    uint32_t targetTick;
    uint32_t lastSync;
    uint16_t syncPacketCnt;
    uint16_t usedPacketCnt;
} ts_trigger_evt_t;

typedef void (*ts_evt_handler_t)(const ts_trigger_evt_t* evt);

uint32_t ts_set_trigger(uint32_t target_tick, uint32_t ppi_endpoint, ts_evt_handler_t callback);

If you are interested, I can try to create a pull request and you can merge this functionality into your repository. I added some other modifications, for example I'm using nrfx_ppi_channel_alloc instead of manually specifying the PPI stuff.

nordic-auko commented 3 years ago

I was busy with a lot of other stuff, but now I had another look into this. I did a few more minor modifications and ran some more tests with my system and it looks very stable. I can reproduce an overall accuracy around 200ns.

That's great to hear, and thanks for the update!

Regarding this:

Regarding the workaround item 3, this effect could also be achieved by having the master transmit packets aligned with the timer wraparound. The peers cannot update their local timers more often than once per cycle anyway. Some additional logic would be required to align the master timer with the radio timeslots.

I found no easy way to align the timer with the radio timeslots. In order to avoid too many discarded packets, I reschedule the next transmissions as earliest (instead of normal) whenever the timer is too close to it's overflow value. In my case I have the additional difficulty with the ongoing BLE activity. When the master has 4 active BLE connections (one for each receiver), the transmission frequency drops from 200 Hz to something around 120 Hz.

Hm, yeah I see the problem. I didn't consider the case of multiple connections, which diminishes the number of possible timeslots.

I also added the possibility for a callback mechanism for my synchronized event. So the API now allows you to add a PPI endpoint and a timer tick in the future. When the timestamp in the future is reached, the PPI endpoint is triggered and the callback will be triggered via EGU interrupt. In my application, the callback contains some status information about the synchronization event (number of received packets, time of last synchronization, etc.). Because I have an active BLE connection to the master, this is really usefull to get a direct feedback and create some statistics (for example to see how many packets where lost). The interface part looks like this:

typedef struct
{
  uint32_t startTick;
  uint32_t targetTick;
  uint32_t lastSync;
  uint16_t syncPacketCnt;
  uint16_t usedPacketCnt;
} ts_trigger_evt_t;

typedef void (*ts_evt_handler_t)(const ts_trigger_evt_t* evt);

uint32_t ts_set_trigger(uint32_t target_tick, uint32_t ppi_endpoint, ts_evt_handler_t callback);

If you are interested, I can try to create a pull request and you can merge this functionality into your repository. I added some other modifications, for example I'm using nrfx_ppi_channel_alloc instead of manually specifying the PPI stuff.

That would be great! The PPI and callback functionality you added seems like a very useful addition I would like to try out, and include in the repo

arzippi commented 3 years ago

Hi @nordic-auko,

did you have time to look at my pull-request yet? Do you need more information / code from me?

nordic-auko commented 3 years ago

Hi @nordic-auko,

did you have time to look at my pull-request yet? Do you need more information / code from me?

Hi @arzippi, sorry for dragging on with this. I've only briefly looked at the pull request. It'll take me some work to set things up and do meaningful testing on accuracy. Will try to get this done by next week.

And thanks for taking the time to set up the PR!

arzippi commented 3 years ago

Hi @arzippi, sorry for dragging on with this. I've only briefly looked at the pull request. It'll take me some work to set things up and do meaningful testing on accuracy. Will try to get this done by next week.

Hi @nordic-auko, no problem. Let me know if you need any more information or if I can support you somehow.

nordic-auko commented 3 years ago

Hi @arzippi, sorry for dragging on with this. I've only briefly looked at the pull request. It'll take me some work to set things up and do meaningful testing on accuracy. Will try to get this done by next week.

Hi @nordic-auko, no problem. Let me know if you need any more information or if I can support you somehow.

Thanks! I'll let you know when I have some results

nordic-auko commented 3 years ago

PR merged