harp-tech / protocol

Description of the Harp protocol.
https://harp-tech.org/protocol/BinaryProtocol-8bit.html
MIT License
3 stars 5 forks source link

Critical synchronization bug results into going back in time #48

Open bruno-f-cruz opened 1 month ago

bruno-f-cruz commented 1 month ago

Consider a rig where a FLIR camera is being externally triggered by an harp behavior board using DO0. When looking at the difference between the timestamps of adjacent frames and triggers, I get the following plot:

image

As you can see, while the camera operates at a steady 8.3ms per frame, two frames in >400k seem to have a lower-than-expected timestamp.

The two "faulty" frames look like:

Frame 187419 has a timestamp difference of 0.00669 s
                FrameAcquired MessageType     Diff
Time                                              
2992896.986816           True       EVENT 0.008352
2992896.995136           True       EVENT 0.008320
2992897.001824           True       EVENT 0.006688
2992897.010176           True       EVENT 0.008352

Frame 341022 has a timestamp difference of 0.00621 s
                FrameAcquired MessageType     Diff
Time                                              
2994176.990112           True       EVENT 0.008352
2994176.998432           True       EVENT 0.008320
2994177.004640           True       EVENT 0.006208
2994177.012960           True       EVENT 0.008320

One import point, is that the bug seems to happen around the second roll-over so I am wondering if this is a problem related with the clock synchronization. It is kinda weird that according to the camera the period remains stable but not according to the behavior board. I suspect this might be because the PWM task is running on the MCU clock and not on the Harp clock. Once the clock synchronizes (and corrects the harp clock) it results in a mis-timestamped trigger as seen by the non-linear harp clock. While this seems like an ok behavior in general if the correction of the harp timestamp is small, it seems weird that such a big gap is found (2ms) between the timestamped reported by the camera and the one reported by harp.

bruno-f-cruz commented 1 month ago

Updating this. Interestingly enough this seems to be a problem with the clock source (TimestampGeneratorGen3). The same problem is present in other registers of the same device (e.g. Analog Data)

image

But also in other devices (Olfactometer) that were acquiring data in the same experiment: image

bruno-f-cruz commented 1 month ago

More data on this bug

Other sessions produce a rather weird bug where the jump in time is actually closer to 1 full second:

                0      diff
Time                       
2487628.957536  1  0.008352
2487628.965888  1  0.008320
2487628.974208  1  0.008352
2487628.982560  1  0.008320
2487628.990880  1 -0.974720
2487628.016160  1  0.991392
2487629.007552  1  0.008320
2487629.015872  1  0.008352

I am not sure if this is the same bug or a mixture of two bugs (the one described above + non atomic update of the two time keeping registers).

bruno-f-cruz commented 5 days ago

@filcarv suggested connecting a clock generator and a subordinate board. Measure both synch led outputs. Unplug the synch cable and check how long it takes to drift the heartbeat.