microsoft / Azure-Kinect-Sensor-SDK

A cross platform (Linux and Windows) user mode SDK to read data from your Azure Kinect device.
https://Azure.com/Kinect
MIT License
1.49k stars 619 forks source link

External sync giving high latency between master and subordinate #1261

Closed neilDGD closed 3 years ago

neilDGD commented 4 years ago

Hola,

I had external sync working between a single master -> subordinate pair, and I received valid timings with 0uS drift according to the subordinate offset I set (200uS). Suddenly this morning I try and get some captures and now I am getting totally random time offsets between master and subordinate sensors, they can vary between a number of seconds to 10s of milliseconds, and these offsets sometimes change when I reset the sensors. Previously I had a problem where on startup I would have a consistent offset of 2 seconds, but that magically disappeared for a while until this started to happen.

Trying to work out the problem, I have just made sure that the master sensor is started last from any subordinates but although it changes the timings I am not seeing the perfectly in-sync frames I was seeing before now (bang on 200us, same as the offset I have defined for subs).

I have even tried on two PCs with different usb controllers and similar results. Starting Master last results in sync offset in the range of 10s of milliseconds.

I have also tried two different sync cables, same result.

I have also tried swapping master and subordinate sync out/sync in config and also different usb ports on both machines and tried to unplug and plug the sensors back in again and power cycling.

I'm on win10 on both machines (machine one: win10 pro build 1909, the other is an insider build of win10).

I could see the timings on the frames when starting the viewer app twice, once for master and once for subordinate, starting the sub first and then the master. I however could not see a delta value so its hard to see how much latency is between them in the app? Is there such information?

Any help would be gratefully appreciated! Are there any known solvable issues with external sync? Am I doing something wrong maybe, a magic api call i'm missing?

Many thanks,

Neil

wes-b commented 4 years ago

Take a look at #530. You should be using the newest firmware you can.

neilDGD commented 4 years ago

Thanks for the reply, I was on 1.4.0 firmware (AzureKinectDK_Fw_1.6.108079014.bin) but just reflashed. Same, this time about 2 milliseconds drift but that time is pretty random in my experience. I also set manual exposure. Is there a more recent firmware update compatible with 1.4.0 sdk with fixes?

thanks.

neilDGD commented 4 years ago

Ok I have found the reason. When I set the exposure setting in the table below any higher than -9 I get bad sync times, from -11 to -9 the time is bang on perfect. Is this supposed to happen or is it a bug?

static const color_exposure_mapping_t device_exposure_mapping[] = { //exp, 2^exp, 50Hz, 60Hz, { -11, 488, 500, 500 }, { -10, 977, 1250, 1250 }, { -9, 1953, 2500, 2500 }, { -8, 3906, 10000, 8330 }, { -7, 7813, 20000, 16670 }, { -6, 15625, 30000, 33330 }, { -5, 31250, 40000, 41670 }, { -4, 62500, 50000, 50000 }, { -3, 125000, 60000, 66670 }, { -2, 250000, 80000, 83330 }, { -1, 500000, 100000, 100000 }, { 0, 1000000, 120000, 116670 }, { 1, 2000000, 130000, 133330 } };

The master and subordinate are set with the following settings in this situation:

colour res: 720p depth res: 640 x 576 sub offset from master - 200uS

valid manual exposure values (mapped to the table above) are -11 to -6 but timings are incorrect from -8 to -6.

wes-b commented 4 years ago

What is your firmware version on this?

neilDGD commented 4 years ago

Latest firmware with 1.4.0:

image

wes-b commented 4 years ago

This might be a regression in the firmware. We need to repro and investigate.

qm13 commented 4 years ago

@neilDGD the unit test for sync passes. The unit test uses a fixed exposure of 8330us and power line setting of 60. The test randomly picks frame rates and delays to validate the timing works. You can find the code for the unit test here https://github.com/microsoft/Azure-Kinect-Sensor-SDK/blob/61951daac782234f4f28322c0904ba1c4702d0ba/tests/multidevice/multidevice.cpp#L521

The unit test prints results similar to this.

Frame Rate: 15 Master depth_delay_off_color_usec: 59388 Sub depth_delay_off_color_usec: -19966 Sub subordinate_delay_off_master_usec: 63066

There may well be a regression that does not exhibit when the exposure is fixed at 8330us. Could you please review the unit test and identify the differences between the unit test and your repro.

tim-depthkit commented 4 years ago

Hey all, (Hi Neil!) I am also running into a seemingly random sync delay offset issue that is leading to interference between synchronized sensors.

I've taken a look at the unit test and run it, and it is currently passing just fine. However, upon inspecting the timestamps coming out of the test, I can identify the issue. Unfortunately the test is not currently written to compare the IR time from the master device to the subordinate device, which is where the seemingly random delay is exhibited.

I am running the test with the following options to simulate the recommended delay for synchronized devices in order to avoid interference:

multidevice_ft.exe --subordinate_delay 160 --fps 30 --s_depth_delay 0 --m_depth_delay 0

I get output that looks like this:

Master Color, Master IR(Delta), Sub Color(Delta), Sub IR(Delta)
---------------------------------------------------------------
Master too old m:   346033 s:   379477 adj sub:   379317 adj delta:    33284
   379322,    396022(16700),    379477(  155),    390500(11023) Stabilizing
   412655,    429366(16711),    412811(  156),    423833(11022) Stabilizing
   445988,    462688(16700),    446144(  156),    457166(11022) Stabilizing
   479333,    496033(16700),    479477(  144),    490500(11023) Stabilizing
   512655,    529355(16700),    512811(  156),    523833(11022) Stabilizing
   546000,    562700(16700),    546144(  144),    557166(11022) Stabilizing
   579322,    596022(16700),    579488(  166),    590500(11012) Stabilizing
   612666,    629366(16700),    612811(  145),    623833(11022) Stabilizing
   645988,    662688(16700),    646155(  167),    657166(11011) Stabilizing
   679322,    696033(16711),    679488(  166),    690500(11012) Stabilizing
   712655,    729355(16700),    712822(  167),    723833(11011) Stabilizing
   745988,    762700(16712),    746155(  167),    757166(11011) Validating
   779322,    796022(16700),    779488(  166),    790500(11012) Validating
   812655,    829366(16711),    812811(  156),    823833(11022) Validating
   845988,    862688(16700),    846144(  156),    857166(11022) Validating
   879333,    896033(16700),    879477(  144),    890500(11023) Validating
   912655,    929355(16700),    912811(  156),    923833(11022) Validating
   945988,    962700(16712),    946144(  156),    957166(11022) Validating
   979322,    996033(16711),    979477(  155),    990500(11023) Validating

We can see that the Sub Color delta is within the acceptable range of 160us, but when looking at the Master IR Delta as compared to the Sub IR Delta, we can also see that these are vastly different, and well ouside the 160us necessary to avoid interference.

If I add a column (last set of parens) for a direct comparison of the Sub IR and Master IR times to the output, it shows the following:

   745988,    762700(16712),    746155(  167),    757166(11011)(-5534) Validating
   779322,    796022(16700),    779488(  166),    790500(11012)(-5522) Validating
   812655,    829366(16711),    812811(  156),    823833(11022)(-5533) Validating
   845988,    862688(16700),    846144(  156),    857166(11022)(-5522) Validating
   879333,    896033(16700),    879477(  144),    890500(11023)(-5533) Validating
   912655,    929355(16700),    912811(  156),    923833(11022)(-5522) Validating
   945988,    962700(16712),    946144(  156),    957166(11022)(-5534) Validating
   979322,    996033(16711),    979477(  155),    990500(11023)(-5533) Validating

Note that the delay between Master and Sub IR time is ~-5533us, which is quite large, and well away from the 160us target.

The really surprising thing is that if I run this test multiple times, I get different deltas between master and sub IR times!

I saw deltas near -18211us and 8667us on two separate runs.

Additionally, if I modify the unit test to use a different exposure time, the magnitude of this delta seems to change.

Separately, in my own application I have logged timestamps for up to 5 synchronized cameras, and I noticed additionally that the "first" subordinate, as in the one with the smallest sync delay, usually 160us, is fairly consistently the one with a depth frame time delta that would cause interference, and all of the other cameras have a depth frame delta that is within the margin of error of an additional 160us offset from the previous subordinate device. I believe this to be the same issue, but it would require a heavy modification of the unit test to support more than 2 devices in order to test this hypothesis.

In conclusion, unless I am missing something, I believe there is a bug in the SDK or firmware that is causing this, and a simple modification of the unit test to compare the master IR timestamp to the subordinate IR timestamp should be enough to catch it in action.

neilDGD commented 4 years ago

Thanks for feedback on this issue guys, valuable to get this working correctly (and to fully roll out multi-cam support for k4a). Have been pulled onto other projects in the meantime but will return to test our multi-cam setup in the future (no time to investigate further atm). The influence of exposure time on synchronised frame delivery certainly seems to be an issue or atleast one factor. Thanks for the input @tim-depthkit.

wes-b commented 4 years ago

@tim-depthkit you did find an issue, but so far not the one we are looking for as far as I can tell.

First a bit about the test. The test randomly selects time values for the masters depth_delay_off_color_usec setting. It also randomly chooses a value for the subordinates depth_delay_off_color_usec and subordinate_delay_off_master_usec setting. So normally the test looks like this:

Test Running with the following settings:
                             Frame Rate: 30
      Master depth_delay_off_color_usec: -12276
         Sub depth_delay_off_color_usec: 7297
  Sub subordinate_delay_off_master_usec: 22896

Delta = Time off master color. All times in usec
Master Color, Master IR(Delta), Sub Color(Delta), Sub IR(Delta)
---------------------------------------------------------------
   366822,    354600(-12222),    389711(22889),    397000( 7289) Stabilizing
   400155,    387933(-12222),    423044(22889),    430344( 7300) Stabilizing
   433488,    421266(-12222),    456377(22889),    463666( 7289) Stabilizing
   466822,    454600(-12222),    489711(22889),    497011( 7300) Stabilizing
   500166,    487933(-12233),    523044(22878),    530333( 7289) Stabilizing
   533500,    521266(-12234),    556377(22877),    563677( 7300) Stabilizing
   566833,    554600(-12233),    589722(22889),    597000( 7278) Stabilizing
   600177,    587933(-12244),    623044(22867),    630344( 7300) Stabilizing
   633488,    621266(-12222),    656377(22889),    663666( 7289) Stabilizing
   666822,    654600(-12222),    689711(22889),    697011( 7300) Stabilizing
   700155,    687933(-12222),    723044(22889),    730333( 7289) Stabilizing
   733500,    721266(-12234),    756388(22888),    763677( 7289) Validating
   766833,    754600(-12233),    789722(22889),    796988( 7266) Validating
   800177,    787933(-12244),    823044(22867),    830344( 7300) Validating
   833488,    821266(-12222),    856377(22889),    863666( 7289) Validating
   866822,    854600(-12222),    889711(22889),    897011( 7300) Validating
   900166,    887933(-12233),    923044(22878),    930333( 7289) Validating
   933500,    921266(-12234),    956377(22877),    963677( 7300) Validating
   966833,    954600(-12233),    989711(22878),    997000( 7289) Validating
  1000177,    987933(-12244),   1023055(22878),   1030344( 7289) Validating

So the delta's in parenthesis correspond to the configured timing values the test chooses.

      Master depth_delay_off_color_usec: -12276
         Sub depth_delay_off_color_usec: 7297
  Sub subordinate_delay_off_master_usec: 22896

In this case the delay between depth and IR on the master should be -12276us (depth/ir) ahead of color. The subordinate color should be 22896us behind master. And the subordinate depth & IR frame should be 7297us slower than the subordinate color image. The precision of our clock is 110us so we don't expect these values to be perfected targeted by firmware.

Tim the command you ran encountered a test code bug with the command line parser. The value of 0 for the delays is what triggered the random delays to be used. So a value of 0 meant don't use use zero 🥴. This command

multidevice_ft.exe --gtest_filter=*multi_sync_validation* --m_depth_delay 0 --s_depth_delay 0 --subordinate_delay 160 --fps 30

should have been specified as this.

multidevice_ft.exe --gtest_filter=*multi_sync_validation* --m_depth_delay 1 --s_depth_delay 1 --subordinate_delay 160 --fps 30

Of course there was no way to know this without looking at the code. And we would get these results:

Test Running with the following settings:
                             Frame Rate: 30
      Master depth_delay_off_color_usec: 1
         Sub depth_delay_off_color_usec: 1
  Sub subordinate_delay_off_master_usec: 160

Delta = Time off master color. All times in usec
Master Color, Master IR(Delta), Sub Color(Delta), Sub IR(Delta)
---------------------------------------------------------------
   366822,    366866(   44),    366977(  155),    366988(   11) Stabilizing
   400166,    400200(   34),    400311(  145),    400311(    0) Stabilizing
   433500,    433533(   33),    433644(  144),    433655(   11) Stabilizing
   466844,    466844(    0),    466977(  133),    466977(    0) Stabilizing
   500155,    500200(   45),    500311(  156),    500322(   11) Stabilizing
   533500,    533522(   22),    533655(  155),    533644(  -11) Stabilizing
   566822,    566866(   44),    566988(  166),    566988(    0) Stabilizing
   600155,    600188(   33),    600311(  156),    600311(    0) Stabilizing
   633488,    633533(   45),    633644(  156),    633655(   11) Stabilizing
   666833,    666855(   22),    666977(  144),    666977(    0) Stabilizing
   700166,    700200(   34),    700322(  156),    700322(    0) Stabilizing
   733500,    733522(   22),    733655(  155),    733644(  -11) Validating
   766833,    766866(   33),    766988(  155),    766988(    0) Validating
   800166,    800188(   22),    800322(  156),    800311(  -11) Validating
   833488,    833533(   45),    833644(  156),    833655(   11) Validating
   866822,    866855(   33),    866977(  155),    866977(    0) Validating
   900166,    900200(   34),    900311(  145),    900322(   11) Validating
   933500,    933522(   22),    933644(  144),    933644(    0) Validating

While I was doing this I did add some test code to:

I still could not find a problem with the timing of the subordinate camera.

wes-b commented 4 years ago

Question about this issue, when you are seeing it, does it repro 100% of the time? If not can you estimate how often?