Chris45215 commented 5 years ago

This is likely the main cause of issues 514 (https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/514) and 804 (https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/804)

Describe the bug The camera latency (the time delay between reality and the output of that frame from the camera) is 0.2 seconds with each the RGB and depth cameras - it's as though the camera is buffering 4 or 5 frames internally. This causes the body tracking latency to be 0.25 seconds, and indicates that the tracking algorithm is fairly fast (requiring only 0.05 seconds); the perceived long delay is caused by the latency before the tracking algorithm receives the depth frame. For comparison, the Kinect For XBox One has a body tracking latency of around 0.12 seconds and half that for RGB and depth sensor data. For more modern comparisons, the Oculus Rift S's cameras have about a 0.02 second latency for room tracking, and some of the current Intel RealSense cameras boast a 0.006 second latency for similar tasks - that's not body tracking, but it shows how quickly the data can get from the camera sensor to a processed output. ROS-Industrial maintains a list of Robot Operating System-compatible depth cameras at https://rosindustrial.org/3d-camera-survey and all have a significantly shorter latency, so it is reasonable to assume that the Azure Kinect should not have a 4+ frame latency when comparable and predecessor cameras manage 1 frame.

To Reproduce 1: With the Kinect plugged in, start the bundled Azure Kinect Body Tracking Viewer app (Azure Kinect Viewer works as well). Maximize the output window. 2: stand in front of the Kinect while facing the computer screen, ensure you can clearly see the screen. 3: pull out your phone, start a camera or videocamera app, and set it to the highest framerate available. 4: Tell your phone to begin recording, and point it at your computer screen so it can see the body tracking output. We'll assume you are holding the phone with your right hand. 5: Reach out with your left hand (or whichever hand isn't holding the phone), and ensure that you can clearly see your extended arm on the Body Tracking output, and can also see that hand on your phone's camera. 6: Suddenly and rapidly pull that extended hand downwards, while holding your phone camera steady. 7: repeat 5-6 a few times, then end the recording. 8: play back the video in a player that allows frame-by-frame (VLC does, if you wish to transfer the video to your computer). Count the number of frames recorded by your phone between the start of your arm movement, and the beginning of the arm movement shown in the Azure Kinect Viewer. 9: Divide that number of frames by the framerate to find the latency of the Azure Kinect camera. For example, a delay of 10 frames with a phone camera framerate of 60fps gives 10 / 60 = .166 seconds latency.

You can make more accurate tests, but this is an easy one that requires no programming.

Expected behavior The delay between the real-world action and the Azure Kinect's perception of that action should be 0.1 second or less; preferably it should be 1 frame at the camera's best framerate for those settings. This should be true regardless of the active sensor type. The Kinect for XBox One had a latency of aprox 0.06 to 0.08 seconds; one would expect similar or better from a more modern, more expensive camera.

Desktop (please complete the following information):

OS with Version: Windows 10, version 1903
SDK Version: 0.93 and earlier

Additional context The test described in bug 804 is more accurate than the test I described, and it measured a latency of 0.14 seconds, which agrees with the results I described after estimating the extra time required for image rendering. Reductions to the lowest possible settings yielded a depth frame latency of 0.1 seconds, which is nearly twice the latency of the Kinect For XBox One - and this is when the Azure Kinect is operating at a lower resolution than the Kinect For XBox One. Latency is a critical factor for robotic uses, so this flaw makes the Azure Kinect inferior to the deprecated Kinect For XBox One.

StevenButner commented 5 years ago

I am pleased to learn that the latency issue is now getting more attention. If it helps, I've attached a small JavaScript program that you can run on any browser to help assess the latency. The program puts up an image of the host system's timestamp and it goes to some length to display the tenths, hundreds, and thousandths of seconds in two forms. The first is the simple HH:MM:SS.mmm and the second form treats each digit of the milliseconds as a slider ---- e.g. ......7 or ..3 --- such that the length of the slider gives as much information as the digit itself. This is important because the display that is available on your system probably only shows frames at 60Hz and there is also some persistence that causes artifacts in any given image from one or more previous ones.

To run the Javascript, save the code snippet below in a file, e.g. /tmp/msecTimer.html. Then run your browser and give it the URL "file:///tmp/msecTimer.html". This should start it running with the result that there is a time display in large letters on your browser screen.

The next step is to position your Kinect so that it takes a picture of the monitor that is displaying the time. Thus, in every frame you can see what the system time was when that frame was displayed. Faster monitors are better than slower ones .... you will see why when you view the frames captured by the Kinect.

The final step requires some programming ... but not much. When you get a capture and extract the color image from it, record (or print out) the frame's timestamp in such a way that you can figure out which image goes with which timestamp. You can use either the K4A "system" timestamp (the one in the K4A library that is in nsec) or the "hardware" timestamp (in usec). In my robotics application, I do some computations to figure out the relationship between the host system''s time and the timestamp used by the K4A's hardware timestamp. Once that offset is known, then it is easy to adjust every frame's timestamp so that it uses the same time epoch as the host system. It is via this method that I found the latency on my system to be 140 msec.

msecTimer.html.gz

Credit goes to https://codepen.io/jasonleewilson/pen/gPrxwX from which my program was derived.

Chris45215 commented 4 years ago

tesych, I'm sure you are busy, but is there any news on this issue? Have you been able to replicate the issue? I expect it's not an unavoidable problem with the camera, as it would be very unusual (and unlike Microsoft) to release such a modern camera with the latency worse than that of the Kinect For XBox One. Some of my applications are very dependent on latency, and we are happy to continue using it while the issue is being resolved. But if it can't be resolved, we need to include additional cameras from other makers in our system - and the earlier we can know this, the better. The Kinect For XBox One and the Orbbec Astra Embedded S are both low-latency cameras with body tracking, and the RealSense has low latency but lacks body tracking in its SDK - we may need to add these as adjunct sensors for our system. Thanks,

Greendogo commented 4 years ago

@tesych This is also a big problem for us, so any updates would be appreciated, so can we have some information about what's going on with this, if possible? Having several thousands of dollars worth of equipment sit around unusable ~~without a bit of commentary from the developers~~ is a bit troubling.

Thank you for any updates!

skeptopic commented 4 years ago

We are also having this issue, and it's a real problem for us. We are considering abandoning the Azure Kinect as the latency ruins interactivity! It's a pity because it's such a great piece of hardware, any updates?

kmjett commented 4 years ago

Going to chime in to say that my company is also going to abandon the Azure Kinect if this is not addressed. We have a stockpile of Kinect V2s and will move onto using the Orbbec Astra if this remains an issue.

AutomationIntegration commented 4 years ago

I will also throw my hat into the ring for this. @tesych and @wes-b, is this an issue that can be fixed? The latency is currently the largest issue my company is seeking to resolve, and we have customers asking when they can purchase our systems. If the fix requires better hardware that makes the cameras cost two or three times as much, our customers will still be satisfied with the result, though obviously that is not ideal for my company and it would push the cost out of reach of many small developers.

We are currently sitting on 8 of these cameras for a single installation, and we expect each installation will require between 4 and 8 cameras. Our leading customer has identified 200 potential sites (mostly theaters and arcades) that would be suitable for installations, if the technology works and has good responses from guests, and we expect that this would drive interest from other customers. We would like to accelerate this as much as possible.

tesych commented 4 years ago

Thank you everyone for your patience while our team is investigating the reported latency. It would be much appreciated if @AutomationIntegration, @kmjett, @NeilChatterjee, @Greendogo can share the following information that will help us in our investigation: • The actual numbers you see • The reason why the latency is the issue for your business ( actual scenarios where Azure Kinect DK is used) • The actual numbers you see for Kinect V2 if you are comparing with Azure Kinect DK In the meantime, we are working on the new firmware release that we believe will improve the latency and will be happy to share with you all the prerelease version, so you can test if it is improved for your scenarios. You can download the Nuget package to test the new firmware. Also, we have another #966 opened that can be relative to this conversation.

AutomationIntegration commented 4 years ago

@tesych thank you very much. Is there an email address or messenger by which I can send some of that requested information? I can send demonstrations that show why latency is an issue, and I suspect Microsoft would be very interested in these applications. I'm happy to send them, but we cannot post some of the information to public forums for intellectual property reasons, so email or message would be preferred.

Thank you,

tesych commented 4 years ago

@AutomationIntegration, please contact me on LinkedIn and I will be happy to share my information with you.

StevenButner commented 4 years ago

So good to hear that the high latency is currently being addressed. I am in complete resonance with @Greendogo , @AutomationIntegration , @kmjett , @NeilChatterjee , and others who are apparently trying to deal with the integration of one or more K4A cameras in commercial product developments where the too-high latency is a non-starter.

You (@tesych) have asked for explanations why latency is important for our applications, for hard numbers from measurements, etc. I am a developer for an autonomous robotic system that uses 2 Azure Kinect depth cameras plus a Hokuyo LiDAR for sensing the environment around the robot. The application features SLAM mapping, autonomous navigation with obstacle avoidance, as well as teleoperation with obstacle avoidance. These systems must be able to operate in a busy environment where there are people walking around in the same space where the robot operates. Also, there can be unmapped obstacles, such as furniture or equipment that are transiently present (that is, such obstacles may not have been present when the SLAM map was made).

The two synchronized K4A cameras are mounted on a mobile 4-wheeled holonomic robot with one camera (known as the "down" camera, abbreviated "DN") mounted about 43" above the floor level looking front and down approximately 43 degrees below horizontal and the other camera (known as the "up" camera, or simply "UP") mounted about 6.5" above the floor, looking front and up about 53 degrees from horizontal. The LiDAR sees a 2D planar view of 270 degrees centered straight ahead and about 9" off the floor.

We use these sensors to build SLAM maps of the customer site. These maps can be subsequently marked up to identify various destinations (for autonavigation), to delineate regions that have special properties (like speed limits, doorways, ramps, highly reflective areas where certain sensors should be ignored, etc).

It is critically important for our robotic application that all sensed data acquired while the robot is moving be accurately timestamped so it is possible to know (as nearly as possible) the pose of the robot in space at the moment each sensor acquired its readings. Additionally, having a reading (from any of the robot's sensors) with a latency larger than about 100 msec represents a condition where the data is simply too old to be useful.

Currently we are going to great lengths to try to work around the large latency seen in the K4A cameras. I've enclosed a snippet from a recent test program's log file that shows some actual timestamp data. More about that in a moment. First I want to describe some of the problems that we must contend with simply because each depth frame coming from either of our K4A cameras is delivering a depth image showing how far various sensed obstacles were .... about 140 msec ago. To illustrate my point, think about a system that is translating at approximately 0.7 meter/sec and simultaneously rotating at ~25 deg/sec. Note that the robot velocities I have cited are, in general, not constant; i.e. the robot is often accelerating or decelerating. One cannot simply use the velocity in a given direction to interpolate the robot pose based on a timestamp difference. In order to do better, we are keeping a rolling, time-stamped history of our robot odometry within the last 0.5 sec. When a depth camera observation is received, we use the timestamp difference (i.e. the frame latency) to tell us how far back in the odometry history to go. We then use that historical odometry reading to tweak the (normally constant) pose of the camera on the robot so that we can give a view of the world that is consistent across all of the sensors at a given moment of time. If we had a camera with lower latency, preferably under 50 msec) we would not need to do these extra manipulations. SImultaneous with this computation we must also match the most recent (by time-stamp) LiDAR reading with the camera so that we maintain a meaningful set of readings taken at nearly the same time.

Hopefully, the scenario above helps to motivate the need for accurate timestamps and low latency. As an experiment to help understand the actual timestamp data that is available in each K4A frame, I have written a test program that simply loops forever (until killed), capturing frames and logging the system CPU timestamp (on our Ubuntu 18.04 system running on an Intel Core-i7 64-bit system), together with the K4A's "system timestamp (nsec)" and, optionally but not shown here, the K4A's "hardware timestamp (usec)" from each of our 2 depth cameras. Our system cycle rate is 5Hz, so we have the K4A's synchronized with the UP camera as the master and the DN camera as the subordinate. We are acquiring BGRA color pixel data at 720P and depth data at 512x512 (WFOV_2X2BINNED).

The log file snippet shown below was taken after the cameras had been streaming for 2 hours and 49 minutes. For anyone interested, I can provide the full file of timestamps taken throughout the whole run. It's not attached here because of file size (40.7 Mbytes). I have data from a similar multi-hour run where I used the K4A's internal "hardware (usec)" timestamp. That one, of course, has the problem that it will, in general, drift with respect to the times measured by a system's CPU clock. For that reason, we must use the K4A's "system (nsec)" timestamp.

k4a-captureTiming

The data item that requires the most explanation in the above listing is the one labelled "latency:". The value given with this label is the difference between the Ubuntu system time when the captured frame was first in-hand and the timestamp representing the actual moment when the camera captured the image. Though the K4A documentation tells us that the so-called "system" (nsec) timestamp represents the middle of the exposure time, it isn't terribly helpful to know this since we cannot pull back the exact exposure time that was used for that image. We need to run the camera with its automatic exposure control enabled in order to ensure that we get good images even when the robot moves between darker and lighter areas. Yet, we want to know quite exactly when the image was taken (so we can match that time with our historical odometry). If we look in the corresponding photo (which for this particular test program, this is a textual overlay that contains the Ubuntu system time) it is possible to read-out at least two digits of fractional latency (sometimes three). A sample image from our down-looking camera is below:

camera-DN-10:43:06 263993

After heuristically interpreting the screen image as captured for the K4A in the above image, I believe that the image has an actual latency of 120 msec (capture timestamp minus the displayed time). In this case, I believe that the displayed time during the exposure interval was 10:43:06.144 whereas the timestamped frame had 10:43:06.263993. Note that the value printed in the sample log file doesn't know the true latency. It simply adds 40 msec to the timestamp it picks up from the frame (plus a time offset that was saved when the two cameras were initialized at start of streaming).

For the purposes of the system I've been developing (and perhaps many others), it would be more useful if we could get the time stamp of the beginning of the exposure interval rather than the middle. In fact, a very useful feature enhancement would be to have the reference point settable by the user! Why not allow, for example, the user to choose via the camera's configuration, the beginning, middle, or end of exposure time as each frame's reference moment?

I look forward to a version of the K4A (soon) with improved latency .....

Chris45215 commented 4 years ago

Here is a screenshot demonstrating the latency with the Azure Kinect camera with the v1.3.0 SDK: https://imgur.com/a/Z1CkBCl. the monitor has a refresh rate of 60hz and I used the stopwatch at http://ipadstopwatch.com/full-screen-stopwatch.html. The live (real) time is 33:650. The most recent color frame was from 33:520, giving a latency in the RGB camera of 130 milliseconds. The values in the infrared camera are just barely visible, but I believe it shows 33:476, which gives an infrared latency of 174 milliseconds. The camera is at the lowest resolution and highest framerate options in order to reduce the latency as much as possible.

To reproduce this result: 1: set up your Azure Kinect camera so it points at your monitor, from a distance of around 400 millimeters. 1: go to http://ipadstopwatch.com/full-screen-stopwatch.html in your browser of choice. 2: press the Start button on that page. 3: move the window so that it occupies only the lower portion of your monitor. 4: open Azure Kinect Viewer v1.3.0 5: resize Azure Kinect Viewer so it fills the top portion of your screen, but the stopwatch on the webpage is clearly visible 6: tell Azure Kinect Viewer to open the camera 7: wait a few seconds to allow the program to open the camera and start streaming. 8: press the Print Screen button, or use the screenshot method of your choice.

If anyone can demonstrate a latency below 90ms with the Azure Kinect in this test, I would love to know their system specs.

NickAtMixxus commented 4 years ago

Thanks for update. Would also like to chime in on this. Have not tried the Kinect Azure cause it's not available yet in my area. However, equal (or better) latency as Kinect V2 is a priority for my work, game/apps for recreation and rehabilitation. (Some already made for Kin V2 and on Microsoft Store. Expected to be able to upgrade those and make more with Kinect for Azure.) To answer @tesych, specifics on why latency is important; You control the games solely with your body, as they are for fun, exercise and/or rehab, the camera response need to be fast and precise. Like following precise small step heights for people that may have difficulties in moving or high steps for those who wants exercise as in Country Ramble options, kicking a fast coming Wall ball, or shooting/aiming, flying fast as in Astronaut Journey. The Kinect V2 do this job excellent! (Maybe best to try them out on Microsoft Store to fully understand why latency is important.) However, since the Kinect V2 is withdrawn and this new K4 not yet is available they can't be upgraded. (They're made in Unity as Store 8.1 apps but Kin2 can't be build for UWP.) I was excited to hear about the new Kinect but this is a bit of a limbo state. Understand you're working on this but it would be great if you could hint on; Are we to expect equal (or better) latency than Kinect V2? Is there a rough timeline on when it is available for other areas and consumers?

sonnybsj commented 4 years ago

@tesych Thank you for the update. I don't have any numbers yet but I wanted to +1 on the issue and also provide another example scenario where latency is an issue.

My company has a Kinect v2 app that's used in hundreds of facilities (and growing) in the US. Our app is for physical rehabilitation (similar to the comment above). We're currently looking at the Azure Kinect and other sensors as a potential replacement.

The Azure Kinect website currently lists "Health and life sciences" as one example industry for the sensor. "Enhance physical therapy, improve and monitor athletic performance, and rehabilitate patients faster with real-time feedback based on data from the Body Tracking SDK (preview)."

I think low latency is essential for many apps in this category, and for interactive apps & games in general.

wes-b commented 4 years ago

Latency is difficult to measure and everyone defines its slightly differently. I think we can all agree that latency measurement begins with the capture of an image. However, we all will have different definition of the end point of the measurement. For those of you working on full scale customer implementations your latency measurement ends with the user seeing the final image or a robot being able to react to what it has seen. For the Azure Kinect Sensor SDK we define it at the boundary of our API. The Body Tracking SDK adds latency on top of the sensor SDK, so we can talk about the two independently.

Sensor SDK latency can come from different places:

SDK settings
Image resolution
Image format
Exposure
GPU Depending on the mode of the depth sensor and the color sensor one or more of these area may impact latency

SDK Settings by default attempts to pair up depth and color image in the k4a_capture_t. The means one image has to be help by the SDK while waiting for the matching image from the other camera. As a result the latency of the combined depth and color image ends up being the slower of the two settings. The environmental variable K4A_DISABLE_SYNCHRONIZATION=1 will turn off this synchronization and deliver the depth or color image as soon as the sensor SDK receives it. This will ensure an image is never held on to. The draw back to this is that that the image processing thread will run twice as often as the color and depth will never be delivered together.

Image resolution is impactful to latency due to the fact the larger the image is, the more data there is to move from the camera sensor, over USB, and deliver to the user. Choose a smaller resolution to minimize latency from the color camera.

Image formats that are not native to the camera, like BGRA32, adds CPU compute to the image processing path because the image have to be converted by the CPU. Use a sensor native resolution like MJPEG, YUY2, or NV12, to minimize latency. To minimize the latency for non-native formats like BGRA32, host PC’s with higher compute can be leveraged.

Exposure time indirectly increases image latency. Exposure itself doesn’t change the image size or the work to transfer the image from the sensor to the host PC. We do however apply our device timestamp at the center of exposure rather than the start of exposure so the image exposure can slow down the camera’s latency if it is set to 33ms.

GPU performance can add to the depth image path as it is used on the host PC to convert raw sensor data to a depth image. The sensor SDK will generate a warning message if this conversion time takes more than 33.3ms, you may consider adding instrumentation to figure out how much time a struggling GPU might be adding.

It is also worth talking about the difference between system time and device time. For Azure Kinect the system time is captured by the host PC after the image has been completely received by the host. This unfortunately does not capture the time the image spends on device or being copied over USB. The device timestamp is captured by the hardware at the center of image exposure. This center of exposure gives us the best sense of the a start time that we can use to measure the latency.

To measure this latency I used a different method than what others have reported doing. Using another camera to record the clock from another screen will certainly measure the full latency of the pipeline, but it is not useful for knowing where the latency is coming from. It can measure many factors that are unique to the host machine because it relies on images to be rendered to the screen. To measure latency I have relied on the IMU device timestamps. The theory being that they arrive often and are small enough that the USB transfer time is insignificant to the image transfer time. The IMU rate is 1666HZ and the samples arrive in batches of 8, this means we get a new message from the hardware every 4.8ms. If, at the moment the IMU sample is received by the host, we capture the system timestamp, we can correlate the two clock domains. Assuming the latency of IMU data is 0ms at best case or 5ms worst case, this process with give us a maximum error of 9.8ms, enough of a ballpark to know how large the latency is.

Here is the latency I captured for the Azure Kinect with firmware 1.6.107078014 using different color and depth modes. This latency is the time from the moment the shutter opens capturing the first pixel to the time that k4a_device_get_capture() returns a capture. It does not add the 9.8ms of the worst case error coming from the IMU time stamps. This also doesn’t account for any latency added by our CPP or managed wrappers.

Rough Linux Latency (ms) from Color Camera:

Resolution	MJPG	NV12	YUY2	BGRA32
4096 x 3072	104	-	-	182
3840 x 2160	65	-	-	115
2048 x 1536	53	-	-	71
2560 x 1440	62	-	-	82
1920 x 1080	60	-	-	72
1280 x 720	55	62	53	66

Rough Windows Latency (ms) from Color Camera:

Resolution	MJPG	NV12	YUY2	BGRA32
4096 x 3072	111	-	-	168
3840 x 2160	72	-	-	161
2048 x 1536	59	-	-	72
2560 x 1440	67	-	-	85
1920 x 1080	63	-	-	75
1280 x 720	62	66	59	68

Rough Latency (ms) from Depth Camera:

Depth Mode	Resolution	Linux	Windows
NFOV Binned	320 x 288	42	36
NFOV Unbinned	640 x 576	42	35
WFOV Binned	512 x 512	33	27
WFOV Unbinned	1024 x 1024	73	62
Passive IR	1024 x 1024	15	13

PC Spec:

Windows: Intel Xeon E5-1620 v3, Nvidia 1060
Ubuntu 18.04: Intel Xeon E5-1620 v2, Nvidia 1070

Body tracking processes a depth and IR frame and returns a 3D skeleton. Its system load is split between the GPU and CPU. With the spec’d minimum hardware of an NVIDIA 1070 and Core i5, latency is typically 28ms, which will sustain 30 fps.

In closing these numbers aren’t perfect, but they are reasonable. Using smaller image resolutions result in lower latency. Kinect V2’s highest resolution offered for color was 1080p and 512 x 512 for depth. Here our latency is inline with that for those same resolutions. BGRA32 modes add latency due to the CPU needing to decompress the image, so in turn they have a higher latency over MJPG. On the depth side the WFOV Unbinned is the most latent, but due to image size its maximum rate is 15 FPS. Otherwise latency on the other modes is quite a bit lower.

So for those that are experiencing latency beyond these numbers, I invite you to attempt our test and get your own numbers.

For running the latency test, please bear with me, the test was designed for other purposes and if I take the time to clean it up I may never get it shared.

Build commit 96d41c4ed5a6f5e9d449ef373a1026a17a0612b6 from this branch: https://github.com/wes-b/Azure-Kinect-Sensor-SDK-1/tree/latency
set K4A_DISABLE_SYNCHRONIZATION=1
To get a color latency number run one of the color latency test commands and capture the average reading from "Color System Time PTS Latency" from the test results.
To get a depth latency number, run one of the depth latency test commands and capture the average reading from "IR System Time PTS" from the test results.
Add 50% of the mode exposure time to the captured value to account for the time from start of exposure to the center of exposure that the device time stamp uses.

Color Latency Test Commands:

Mode	Resolution	Command
MJPEG	3072p	latency_perf.exe --gtest_filter=15*/0 --exposure 33330
MJPEG	2160p	latency_perf.exe --gtest_filter=30*/0 --exposure 33330
MJPEG	1536p	latency_perf.exe --gtest_filter=30*/1 --exposure 33330
MJPEG	1440p	latency_perf.exe --gtest_filter=30*/2 --exposure 33330
MJPEG	1080p	latency_perf.exe --gtest_filter=30*/3 --exposure 33330
MJPEG	720p	latency_perf.exe --gtest_filter=30*/4 --exposure 33330
NV12	720p	latency_perf.exe --gtest_filter=30*/5 --exposure 33330
YUY2	720p	latency_perf.exe --gtest_filter=30*/6 --exposure 33330
BGRA32	3072p	latency_perf.exe --gtest_filter=15*/1 --exposure 33330
BGRA32	2160p	latency_perf.exe --gtest_filter=30*/7 --exposure 33330
BGRA32	1536p	latency_perf.exe --gtest_filter=30*/8 --exposure 33330
BGRA32	1440p	latency_perf.exe --gtest_filter=30*/9 --exposure 33330
BGRA32	1080p	latency_perf.exe --gtest_filter=30*/10 --exposure 33330
BGRA32	720p	latency_perf.exe --gtest_filter=30*/11 --exposure 33330

Depth Latency Test Commands:

Mode	Command
NFOV 2x2 Binned	latency_perf.exe --gtest_filter=30*/12
NFOV Unbinned	latency_perf.exe --gtest_filter=30*/13
WFOV 2x2 Binned	latency_perf.exe --gtest_filter=30*/14
WFOV Unbinned	latency_perf.exe --gtest_filter=15*/2
Passive IR	latency_perf.exe --gtest_filter=30*/15

StevenButner commented 4 years ago

Thank you, @wes-b !! Your description of the various components affecting latency on the K4A is valuable. I have used your latency_perf tests to measure latency on my system. In order to understand how to interpret the individually-reported latency timing results, however, I need to ask for a clarification: When a given configuration of color format, depth format and other factors is chosen, should we expect to observe an overall latency that is the sum of the individual component latencies --- OR ---- should we expect that the individual component latencies overlap one another in time? I believe they overlap such that the most latent component sets the overall observed latency. Can you confirm?

wes-b commented 4 years ago

The test can only measure from the center of exposure until the SDK API provides the image. So from the perspective of capture time, USB transfer time, GPU conversion, etc the measurement is inclusive of all those tasks. The test runs both color and depth at the same time so the resulting measurement is a union (longer) of the two camera latencies when K4A_DISABLE_SYNCHRONIZATION is not used. When K4A_DISABLE_SYNCHRONIZATION=1 is set then the two are decoupled and the tool can measure color and depth individually.

Chris45215 commented 4 years ago

@Wes-b thanks for all the work you have clearly put into the subject; and we also appreciate the unique difficulty that latency measurements involve. We've assigned one of our team members to reproduce your tests, and will post our system specs and results when they have completed them. Your results are significantly faster than our experience has been thus far, so we expect that there will be some improvement at least from following your own code and optimizations. We also have some plans to check for IMU latency - we don't expect it to be delayed, but we want to ensure we cover all the bases, just in case.

Thanks again,

StevenButner commented 4 years ago

@wes-b I second the remark above from @Chris45215 . The latency_perf test program plus the detailed breakdown of the components of latency that you wrote were extremely helpful. By changing the color format from BGRA32 to YUY2, I was able to cut nearly 50 msec off the latency as measured locally (moving it from ~140msec to approximately 92msec).

I have changed my usage of the K4A's internal hardware timestamp (since it is asynchronous to my processor's real-time system) in favor of using the K4A library's system timestamp. The latter has a bit more jitter (on Linux) than a hardware clock but it is significant in my application that the K4A and host system timestamps never drift with respect to one another. Once I learned that the K4A library's system timestamp is based on Linux's CLOCK_MONOTONIC, it became easy to convert between my real-time system's time stamps and those included in each frame (and I do not need to make any adjustments for asynchronous system drifts).

Greendogo commented 4 years ago

Quick question @wes-b, beyond #1085 is there the possibility of more changes in the pipeline that improve latency? Having read some of your comments above, it looks like you're saying that it's all expected behavior already.

wes-b commented 4 years ago

We don't have anything planned currently. We do think its all as expected. There is probably room to convert to BGRA32 more efficiently with different algorithms, but we should not be in the way of that.

Greendogo commented 4 years ago

What we want to understand is why can't we get a YUY2 (or some other format) frame out in less 40ms.

We would be happy to have the rgb camera turned off so we can just obtain depth info in less than 40ms.

RoseFlunder commented 4 years ago

@Greendogo Did you test turning the color camera off via the resolution setting? K4A_COLOR_RESOLUTION_OFF

StevenButner commented 4 years ago

@RoseFlunder .... I can answer your question to @Greendogo since we work together on the same project. Unfortunately, our project uses multiple cameras and thus requires hardwired synchronization. That synchronization, in turn, requires that the color camera is streaming. We do not need the color images at all; our entire need is depth information. But (sigh) the path to getting that for this particular product involves many other, seemingly-unrelated requirements like this one (color camera streaming) plus the whole GPU-based processing of depth and all of the issues that go with that for a headless Linux system.

If only the processing currently done on the GPU could be moved onto the main CPU. It is hard to believe that there is that much computing required in order to justify the considerable effort and usage of resources that are being applied to the depth computation (via the GPU). But since that computation is proprietary, we will never know. One thing that seems obvious is that the latency we see in depth is likely due to all of the data movement, queueing, and synching between camera -> CPU -> GPU -> CPU etc., etc. This latency is unlikely to improve much unless there is some sort of a structural change.

It has always troubled me that this is a 5/15/30 fps camera system but the latency we see is always significantly greater than the frame interval of the fastest setting (30 fps), i.e. greater than 33 msec. I would suspect that, for most any real-time application where this camera is a candidate to be used as a sensor, you will find this property to be the single most important issue in choosing whether to use the camera over others that are on the market.

Chris45215 commented 4 years ago

@wes-b, I’ve been working with the branch you instructed, and I hope you can bear with me a bit as C++ is not my native language.

I downloaded the full project (I had to download it as a Zip file, as cloning via Github Desktop was stopped by an authentication check which did not accept my username and password); then followed the instructions at https://github.com/microsoft/Azure-Kinect-Sensor-SDK/blob/develop/docs/building.md (with the exception of downloading as a Zip file) to build the project on Windows with Visual Studio 2017. I ran the verify-windows.ps1 script to ensure my computer is set up correctly, and got the result “Machine setup. Please use Visual Studio Developer Command prompt to build.” After setting tests/latency/latency_perf.cpp as the start project and telling Visual Studio to run in debug mode, line 579 of latency_perf.cpp gets the result K4A_RESULT_FAILED. A stack trace shows that the failure is k4a.c line 896 ---> deptch.c line 397 ---> dewrapper.c line 553 and will return error “Depth Engine thread failed to start”.

It seems that I am doing some part of this wrong, and I probably overlooked a very basic step, so any guidance would be helpful.

Beyond that, we’ve been testing with body tracking on multiple Azure Kinects and have seen a performance drop when using 2, with lesser incremental drops upon adding more. With a RTX 2070 Super and 4 cameras that are performing only body tracking (ColorResolution = ColorResolution.Off, NFOV_unbinned), with each camera on a discrete computation thread and in standalone mode, we observe a body framerate of 12FPS for each camera and the latency is about 600 milliseconds uniformly across all the cameras. The low framerate is expected if the GPU is the bottleneck, but the latency should be relatively unaffected aside from the extra delay from longer frame timings. I’m uploading a modification of the SDK’s Unity example (http://azure-kinect-samples/body-tracking-samples/sample_unity_bodytracking/) to https://www.dropbox.com/sh/h80k2etzqysek2y/AACIeF6TuerQlqFDlEir99_Ga?dl=0 – to run it just replace the Assets\Scenes\Kinect4AzureSampleScene-DiscreteTrackers.unity with the uploaded one, and replace the Assets\Scripts\main.cs and Assets\Scripts\SkeletalTrackingProvider.cs files with the uploaded ones, and do the same with the .meta files. The changes from the SDK version are very rudimentary, so it should be easy to adapt it to use more or fewer cameras. The only caveat is that it works best when a brief pause is added between camera initializations, otherwise some initializations can fail.

Among other thoughts to address the issue, how would the SDK handle multiple GPUs? We can put 2 or more GPUs in each computer; if there is a way to match a camera to a GPU, that might give optimal performance without requiring a dedicated computer for each camera.

wes-b commented 4 years ago

@Chris45215 you mind moving this to a new issue? There are several questions to unpack.

Chris45215 commented 4 years ago

No problem, I'll start dividing the questions tonight. The 4-camera latency case might need to wait until tomorrow, so I can record and upload a video demonstrating it

Chris45215 commented 4 years ago

@wes-b, I want to confirm something before I finish uploading a video demonstrating the issue definitively - the body tracking uses the most up-to-date information that it can, correct? I know it can't be perfectly real time, but it gets as close to real time as the depth sensor allows - right? And your earlier post said bodytracking has a processing time of 28ms with a GTX 1070; so an RTX 2070 Super should handle it at least as quickly.

Maybe there's a simple error in the particular code example from the SDK that I used, or maybe I changed something the wrong way, or maybe there is something in the documentation that I missed. But if not - does Microsoft have a bug bounty? Because documenting this has taken a lot of my company's time.

wes-b commented 4 years ago

If you run with the color camera off or use the env var I mentioned, then you will get the fastest performance from the sensor SDK. I am asking team members more familiar with BT to comment on that part.

Chris45215 commented 4 years ago

Here you will find a beautiful demonstration of the bug: https://youtu.be/7Jc7KhoPWdc. The tracker never outputs the result from the most recent frame; it always outputs the result from the third most recent frame. And that is when running at 5FPS – it is not slowed by the processing load.

I recorded the video at 60FPS with my phone, and that video plays it back at 1/5th that speed to make the issue easier to observe. The PC has an AMD 2700X CPU and a Nvidia RTX 2070 Super GPU. As said at the start of the thread – the camera and/or SDK is sitting on several frames for no reason. The result is similar if you run 5 or 6 sensors on the same computer at 30FPS with perfect multithreading – the GPU will split processing between them, and each will show the movement on its 3rd (or later) update rather than its first.

If this applies to the depth sensing as well as the body tracking, it perfectly explains why @StevenButner's most optimized setup yields a latency around 90ms when it should be much better.

To reproduce: 1: Open the SDK project Azure-Kinect-Samples/body-tracking-samples/sample_unity_bodytracking/ 2: Open the file SkeletalTrackingProvider.cs in your preferred code editor. 3: Change line 31 from “CameraFPS = FPS.FPS30,” to “CameraFPS = FPS.FPS5,” 4: Save the changes. 5: Setup your Azure Kinect sensor. 6: Setup another camera which can observe you and the monitor. Preferably also observe the Azure Kinect with it. 7: Run the scene. 8: Record with your (non-Kinect) sensor. Make a few rapid movements that are easy to mark. 9: Open this new video in any program which can view frame-by-frame. Windows Media Player is great at this; you can advance 1 frame at a time by pausing the video and Control-clicking the play button. 10: Count the frames between your movement and the onscreen avatar movement. 11: Pay special attention to the fact that the camera makes several captures in the time between your movement and the onscreen movement. 12: Count the camera flashes between your movement and the avatar movement. 13: Count the avatar updates between your movement and its reflection of your movement.

Hopefully, some part of the code is simply pulling frames from the wrong end of a buffer. If not – if the system is designed so that it waits for subsequent frames to refine its data – then that needs to be disclosed.

Chris45215 commented 4 years ago

After some additional tweaking, I did find some improvement that I could make to the Unity C# example code. Here is a new SkeletalTrackingProvider file (replace the ".txt" extension with ".cs"; github doesn't allow me to attach .cs files) that slightly helps alleviate the problem. The short explanation is that there is at least one extra frame added to the queue at startup, but it is not pulled by the .pop(), and thus the queue builds up and the program is indeed pulling stale frames. However, this only improves the issue by a single frame - the output will now appear after the 2nd sensor capture whereas before it appeared after the 3rd sensor capture. SkeletalTrackingProvider.txt

I do not know if this problem happens in the C and C++ versions of the code as well, someone else can check on that. And I repeat that this change is (at best) only halfway towards a solution, as body tracking is still an additional frame behind what it should be - if that remaining extra frame is fixed then the delay between the moment of capture and the output from the tracker is still longer than desired (is communication between CPU and GPU really that slow?). And I would suggest that someone revise my change (or the SDK) so that it ensures there are no more Tracker results available in the queue, rather than simply trying to pull a single extra result.

Perhaps the SDK should be changed so that it provides a .dequeue() to get the next FIFO result, and a .checkLatest() which returns the freshest result (perhaps without removing it from the queue, as that could cause a stale result in a subsequent call).

Greendogo commented 4 years ago

@Chris45215 - @StevenButner and I have discussed wanting such a function. It would be really sweet to get the latest result and ignore the older stuff.

yijiew commented 4 years ago

Hi @Chris45215 , in order to isolate the issue from the body tracking SDK from the sensor SDK, could you try this synchronized offline processor? https://github.com/microsoft/Azure-Kinect-Samples/tree/master/body-tracking-samples/offline_processor. You can set a stopwatch before k4abt_tracker_enqueue_capture and after k4abt_tracker_pop_result to measure the time to use to run the body tracking algorithm.

As for the additional queued frame, if you care the latency so much, you should always use synchronized call (set timeout to K4A_WAIT_INFINITE for both enqueue and pop result function). Then you will guarantee to get the result of the frame you pass in.

@Greendogo The current design of the API enables you to do so. If you always want to get the latest result, you could implement a while loop, to keep calling k4abt_tracker_pop_result function with timeout to be 0 until it fails. The last result you could get is the latest. You can ignore the older stuff on your own code.

Chris45215 commented 4 years ago

@yijiew I ran offline_processor.exe on a saved 30FPS recording (color and depth) that lasts 30 seconds; the processing time was around 22-24 milliseconds per frame, and I have a screenshot of the result at https://imgur.com/gallery/cLuL4jx. The recording consisted mostly of me walking around. But this showed something unexpected: my math tells me that 30FPS * 30 seconds = 900 frames, but the processing output shows 903 frames. That extra 3 frames sounds suspicious, it might be related to the 'stale frames' issue that I demonstrated in the video I posted earlier, so I made a few more recordings.

I made two new, fresh recordings using the command ".\k4arecorder.exe -d WFOV_2X2BINNED --imu OFF -l 30 output.mkv" - the first of these produced 900 frames exactly, the second produced 903. All have a bodytracking time around 21 to 24 milliseconds.

I made a fourth, 30-second recording, this time getting only the depth and using the command "k4arecorder.exe -d WFOV_2X2BINNED -c OFF --imu OFF -l 30 output.mkv" (the last of the examples on https://docs.microsoft.com/en-us/azure/kinect-dk/azure-kinect-recorder, but changed from 5 seconds to 30 seconds) and it made an output with 901 frames, with a processing time around 22-24 milliseconds.

I can make a new, separate issue for these inconsistent frame counts, but I suspect that frame count inconsistency is a symptom rather than the root cause of a problem. My more important question is: are you able to reproduce the latency that I demonstrated in the video that I posted earlier?

yijiew commented 4 years ago

@Chirs45215 Thanks for giving it a try to run the offline_processor.exe. It proved that the body tracking processing time was about 22-24 ms/frame. As for the inconsistent recording frame count, I think it is expected. The 30 FPS normally is not exact. If you suspect there are some "stale frames", you could print out the timestamp to see whether there are any two timestamps the same. But of course, feel free to create another issue regarding to this.

Before I tried to repro this, do you know how much latency is due to the sensor SDK and how much latency is due to body tracker itself? You just proved that the body tracker latency should be as small as 22-24 ms. If it is mainly caused by the sensor SDK, I will redirect this issue to @wes-b .

fractalfantasy commented 4 years ago

I’ve just submitted a feature request that will easily solve the latency issue, it’ll need support and feedback to even be considered so please upvote and chime in if u agree:

https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/39945454-legacy-body-tracking-like-kinect-v2

cc: @Greendogo @AutomationIntegration @kmjett @NeilChatterjee @StevenButner @NickAtMixxus @sonnybsj

Greendogo commented 4 years ago

I do not believe that the latency addressed by this issue, #816, is related to Body Tracking latency.

Are there any updates on this issue from anyone?

AutomationIntegration commented 4 years ago

@yijiew I am responding for both myself and @Chris45215 as he is part of my company.

Wherever the specific problem lies, I think it is absolutely clear to everyone from that 5FPS test that something, somewhere, is not performing correctly. We cannot isolate the exact source of the error for you (nor should my employees be expected to), but it is clearly there. Maybe the camera's firmware is holding a frame back and submitting an older frame than it should. Maybe the depth SDK has an error and is buffering an extra frame when it shouldn't. Maybe the body tracking SDK has an error and is buffering an extra frame when it shouldn't (in fact, Chris already found that it was doing exactly that). And maybe the bodytracking neural network is designed to 'cheat' by returning the result of the 2nd-most-recent (or older) frame, because it can give a more accurate output if it waits for a subsequent frame(s) to check its answer.

The question has little to do with the processing time - I think if you watch the video Chris posted to https://youtu.be/7Jc7KhoPWdc you will agree that the body tracking latency exceeds any possible processing time. Even with the fix Chris later found that resolves one of the two excess frames, the latency still far exceeds any possible processing time because the system waits through that 2nd extra frame. 80ms is expected, not 280ms.

Regardless, after seeing that Microsoft has not repeated the simple test that Chris showed in the video, and the length of time that the issue has persisted, we stopped trusting that Microsoft would ever address the issue. We have switched to a different depth camera supplier. We would like to use Microsoft's cameras and could go back to them, but we cannot have poor body tracking latency, and we cannot get the certifications we need if we cannot explain the hardware performance.

Chris asked me to add: the body tracking latency is better at 15FPS than at 30FPS, at least in our tests. There is no reason for that to happen.

fractalfantasy commented 4 years ago

I do not believe that the latency addressed by this issue, #816, is related to Body Tracking latency.

Are there any updates on this issue from anyone?

yes you would think so.. but @qm13 has closed #514 and stated it's being tracked here now, so this thread actually does cover body tracking latency and performance now.

Greendogo commented 4 years ago

I do not believe that the latency addressed by this issue, #816, is related to Body Tracking latency. Are there any updates on this issue from anyone?

yes you would think so.. but @qm13 has closed #514 and stated it's being tracked here now, so this thread actually does cover body tracking latency and performance now.

@qm13 To my understanding the latency with the camera in-general and the latency with the body tracking are two separate issues. The issue with the camera in-general seems to be unrelated to the problem of the type of model used in body-tracking.

If these two separate issues are really both being tracked in #816 then this is definitely a bad idea. This will devolve into us shouting over each other to get our issue attention. I suggest the developers reopen issue #514 immediately to get both of these issues the attention that they deserve.

514 specific related conversation is already polluting this discussion; prior to that a customer said their company can no longer use this camera because the original issue of the latency is not getting addressed and now that resolution they seek is going to be harder to achieve as a direct result of these issues being merged.

qm13 commented 4 years ago

@Greendogo I agree with you. My mistake. I have reopened #514.

vpenades commented 4 years ago

@AutomationIntegration , @Chris45215 I've seen the video you produced to showcase kinectv2 latency. Certainly, it is quite discouranging...

But also, I would like to ask if you took into account display frame delay. Typically graphics engines use double buffering, so they use do display one frame behind the latest renderng. It is true that the latency should be fairly small, given that at the very least, the fps of an average monitor is 1/60 of a second... much less on pro monitors.

qm13 commented 4 years ago

The Sensor SDK is entirely separate from the Body Tracking SDK. We recognize that there are performance issues in both SDKs. Using a single E2E experience whilst demonstrating E2E performance issue it makes it hard to isolate the bottle necks. Can we please use this issue to focus on sensor performance - the time light enters sensor to the time a capture is returned. We will use #514 to track body tracking performance.

shevart75 commented 4 years ago

Any progress?

wes-b commented 4 years ago

Thanks for the ping. From an Azure Kinect SDK point of view the latency issue is resolved with the above comment https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/816#issuecomment-583161328.

Internally we have been using this bug to track Sensor SDK performance, and #514 for Body Tracking performance. With no more comments about the Sensor SDK specifically in over a month I am closing this issue. If a new question or concern is raised for the sensor SDK then please feel free to open a new issue.

At this time #514 is still open and tracking BT SDK performance.

Greendogo commented 4 years ago

@qm13 @wes-b Please don't close this, the problem is not fixed. I think the reason there have been no comments in the past month is because we're still waiting for you to fix it and nothing has changed.

The comment you made is not the fix we are waiting for; it is an optimization.

This doesn't change the fact that the camera is super latent.

Our use case: 2 synced cameras Only really care about depth, but we can't run and sync them without having the color camera running. This is a limitation on your side.

Our set up: Color Camera YUV2, 1280 x 720

Depth Camera WFOV Binned, 512 x 512

Running at 5 FPS

Expectation: Latency should be less than 30ms behind reality. It is currently more than 90ms.

I'm surprised that you keep saying that this is a no fix. It is broken for real-time applications, which are most of the K4A camera's applications.

@wes-b you've hinted that the GPU cycle has some number of frames queued up; a potential source of the undesired latency Have you considered using the CPU instead of using the GPU. That should save you some time (presuming the trips to the GPU are causing the issue here).

This is not fixed until this latency is under 30ms behind reality, as mentioned before. It is avoiding the problem if this is labelled a NO FIX. Maybe get a product person to explain HERE why the decision has been made not to fix this. You're basically deciding that this broken thing is "not broken", which is just not true.

wes-b commented 4 years ago

@Greendogo how are you measuring 90ms? Data I captured and shared has 59ms for YUV2 from the start of exposure to the availability of the image via k4a_device_get_capture. That also uses the worst case exposure of 33ms, so the image is available less than 30ms after the exposure ends. For WFOV binned we are 33ms from start of exposure to presentation at k4a_device_get_capture.

Greendogo commented 4 years ago

@wes-b Our data on that is quite old and we will be re-measuring soon.

microsoft / Azure-Kinect-Sensor-SDK

Very high latency with all sensor types, regardless of framerate #816