Closed rfilkov closed 4 years ago
I understand that the body tracking is only a preview release, and will likely be improved over time. As a stopgap, if it's possible, perhaps the body tracking can expose a parameter that can be tuned to achieve higher accuracy for offline or remote processing on powerful systems for non-real time applications.
There are many use cases that might only need a 3-4 second clip and could afford to spend minutes/hours processing, but can't afford meter+ gaps between actual and reported joint positions.
Fully agree!
I have a feeling like the output data is also overly filtered/smoothed somewhere in the pipeline, possibly at the model fitting stage after the DNN? Maybe for certain use cases it could be useful to favor plausible poses over per-frame accuracy, but a user setting to tune this behavior could definitely improve usability in many more scenarios.
Another addition could be to output the 2D joint positions straight from the DNN when we need accurate points on the IR frame prior to any model fitting and post filtering stages.
I've had these 2 suggestions on the feedback page already in case you want to upvote: https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/38027029-user-adjustable-skeleton-smoothing https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/38031535-2d-joint-positions
Thank you for the feedback. Please note that the Body Tracking SDK is currently a preview release and we are actively working on many aspects of quality. The installation process and dependencies are a pain point that we are aware of and are working to fix.
As of the preview release the minimum recommended card is a GTX1070: https://docs.microsoft.com/en-us/azure/kinect-dk/system-requirements
Please keep sending us your feedback as new releases come out.
@Brekel I upvoted your first suggestion some days ago and now - the 2nd one as well. Anyway, I don't think a single hyperparameter tuning would solve the lag-issue. DNNs are notoriously slow, even on high end graphics processors.
Thanks for the upvote. Maybe you're right but I think this is not lag/slowness but filtering I'm seeing on my GTX1080ti, could be wrong.
In any case the DNN is still amazingly robust so not a critique just trying to help the team in making it more awesome and useful for more scenarios. The more controls we have the easier that will be.
@cdedmonds Thank you for the swift response! I know the system requirements and that BT SDK is still in preview. But I think it would be better (for both - you and us) to give such a feedback now, while in preview than when the official release is out.
I suppose the Azure Kinect-team is much different than the team developed the Kinect-v2 sensor and SDK, but anyway, you should have access to all the documentation and internal info from back then. It may be worth comparing the K4A body tracking to the body tracking of Kinect-v2, in means of performance, accuracy, features, extras, etc. It would be good to get a well performing, more accurate and more versatile body tracking in the end than the previous one.
Please keep this issue open, so we could comment when the new BT SDK releases come out. When do you plan to release the next preview version, by the way?
We appreciate the feedback. Keep it coming.
@cdedmonds I would say the key part is exposing as much control as possible to make things adaptable to a wide range of scenarios.
For example:
Again I don't know what is possible with the underlying algorithms and am just guessing, but the more that can be exposed the more wide the use cases. We're clever programmers using this you know :)
I definitely agree that if you can expose more control parameters, more possible use cases open up in a non-linear fashion, because then developers can work around the limitations by adapting their workflow, or by combining outputs from processing a stream more than once. I think this is particularly true for a device like the Kinect which feels like a revolutionary "general purpose" technology, with a lot of applications that haven't really been developed before. For instance, we would love to have low-latency, low computation positional accuracy. However, if that's not possible, we can always send the mkv files for remote processing on cloud GPUs and then send the data back to the end user.
Absolutely! Although cloud is not always an option since body tracking deals with people and possible privacy issues that may require local compute. :)
@cdedmonds Please also look at what the competition is doing: https://developer.apple.com/videos/play/wwdc2019/607
For my use cases, the tracking latency and the ability to recognize a person are the key concerns - positional accuracy within 6 inches (roughly 15cm) is sufficient. Multiple sensors can be used to increase accuracy, but there is no way to get around the latency and tracking delay. 60fps depth tracking would probably help alleviate the problem, but if that's not possible with the hardware then reduced latency is a priority. We can always add more cameras, but we can't subtract time. Though I note it would be easier to add more cameras if they didn't each require a PC with a Nvidia card; Intel NUCs would take far less space, and in fact the Azure Kinect fits atop them perfectly.
I must admit, v0.9.2 has significant improvements in means of performance and accuracy. In this regard, I have a question: do you change the model as well, or only the SDK internally?
@rfilkov Is there an MSI for 0.9.2? I can't seem to get my hands on it.
@rflikov v0.9.2 includes an updated model.
The new model does seem slightly better.
The main issue for us is that the model does not output confidence scores for the joints. Most 2D pose estimation have this feature, which makes it easy to deal with situations where the model has to fail (occluded joint). Is this something that can be added? I'll start a new issue with a feature request.
I'm agree for the confidence score fro each joint. This improvement seems to be under evaluation (https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/38166871-joint-tracking-state). Another big limitation is the hardware requirement. We want to work in mobility with laptop but such hardware (e.g. GTX 1070 or RTX 2070) need to much power to work in mobility (in battery mode). We think it is very important to add more configuration possibilities for the body tracking. For example à lighter model compatible with lower grade graphics cards (GTX 1650 for example).
@hmexx Here is a feature request in this regard: https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/38166871-joint-tracking-state Please upvote or comment there, too.
@PierrePlantard Please upvote this feature request, as well: https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/38129473-body-tracking-without-cudnn
@PierrePlantard Please upvote this feature request, as well: https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/38129473-body-tracking-without-cudnn
Already done few weeks ago @rfilkov :)
Regarding pose smoothing, I never understood why low level APIs already deliver smoothed skeleton.
Skeleton smoothing is relatively inexpensive and easy to compute. Actually with kinect2 we developed our own skeleton smoothing algorythms based on our needs.
In other words, smoothing the skeleton is a post-process that can be done either with an additional call to a "SkeletonToolkitAPI" provided by the body tracking SDK itself, or by a third party library.
So there's no point in the output to be smoothed by default.
You can turn it off in the latest v0.92 release. It proves the more settings/controls we have in the sensor/tracking SDK the better it can be adapted to different use cases.
@PierrePlantard I've used the Azure Kinect for body tracking with a GTX 1060 and it works, and uses at most about 20% of the GPU according to Windows Task Manager. I've not compared it's performance to a GTX 1070 or greater, so I can only confirm that it does run and returns results. It is noticably slower than the tracking of the Kinectv2, but that seems to be true with even the GTX 1080.
The sensor is intended to track multiple people but my tests were done with only a single person, so it is possible that the 1070 specification is a high estimate to ensure it can track 6 or more people simultaneously.
On Fri, Sep 6, 2019, 3:30 AM PierrePlantard notifications@github.com wrote:
I'm agree for the confidence score fro each joint. This improvement seems to be under evaluation ( https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/38166871-joint-tracking-state ). Another big limitation is the hardware requirement. We want to work in mobility with laptop but such hardware (e.g. GTX 1070 or RTX 2070) need to much power to work in mobility (in battery mode). We think it is very important to add more configuration possibilities for the body tracking. For example à lighter model compatible with lower grade graphics cards (GTX 1650 for example).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/514?email_source=notifications&email_token=ACAXNVCZR3VXDJZ3L377CKDQIIBJ3A5CNFSM4IEQ5OKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6CAJLY#issuecomment-528745647, or mute the thread https://github.com/notifications/unsubscribe-auth/ACAXNVBVUJEJH3OVDFA3UJTQIIBJ3ANCNFSM4IEQ5OKA .
Hi All, may i know the latest model version of azure body tracking sdk 0.9.2 solved the too slow and inaccurate issues? I can't fine any feedback or comment so far for this latest version.
@Chris45215, Yes the body tracker also work in my laptop with a gtx 1050ti, but only for 15 fps. When I put my laptop to battery mode, the fps drop down to 5 fps due to power throttling of the laptop. These low fps induced a important lag, which is unacceptable for my use case. A good configuration feature (available for many deep learning models), is the possibility to chose between accuracy and computing power required.
Le jeu. 12 sept. 2019 à 00:18, Chris45215 notifications@github.com a écrit :
@PierrePlantard I've used the Azure Kinect for body tracking with a GTX 1060 and it works, and uses at most about 20% of the GPU according to Windows Task Manager. I've not compared it's performance to a GTX 1070 or greater, so I can only confirm that it does run and returns results. It is noticably slower than the tracking of the Kinectv2, but that seems to be true with even the GTX 1080.
The sensor is intended to track multiple people but my tests were done with only a single person, so it is possible that the 1070 specification is a high estimate to ensure it can track 6 or more people simultaneously.
On Fri, Sep 6, 2019, 3:30 AM PierrePlantard notifications@github.com wrote:
I'm agree for the confidence score fro each joint. This improvement seems to be under evaluation (
https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/38166871-joint-tracking-state ). Another big limitation is the hardware requirement. We want to work in mobility with laptop but such hardware (e.g. GTX 1070 or RTX 2070) need to much power to work in mobility (in battery mode). We think it is very important to add more configuration possibilities for the body tracking. For example à lighter model compatible with lower grade graphics cards (GTX 1650 for example).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/514?email_source=notifications&email_token=ACAXNVCZR3VXDJZ3L377CKDQIIBJ3A5CNFSM4IEQ5OKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6CAJLY#issuecomment-528745647 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ACAXNVBVUJEJH3OVDFA3UJTQIIBJ3ANCNFSM4IEQ5OKA
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/514?email_source=notifications&email_token=AALQ6VZF43LWTFPN672BZ4DQJFVBXA5CNFSM4IEQ5OKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6QCDIY#issuecomment-530588067, or mute the thread https://github.com/notifications/unsubscribe-auth/AALQ6V3UU3AY7VSFPOKGDTLQJFVBXANCNFSM4IEQ5OKA .
@JMonkey94 - In my opinion and for my application, 0.9.2 is a definite improvement, but it is still too inaccurate. We are doing offline processing so speed isn't an issue regardless.
We released 0.9.3 last Friday. It includes a brand new ML model (v2), improved 3D model fitting, smoothing off by defaults, and improved performance. Love to hear from the community on whether this release improves your apps.
Hey there.
we just tried your new model and it is definitely improved. Well done! Unfortunately without joint confidence it’s still unusable for us. Even with this improved model, when a joint is occluded, it will end up in a random location. A low joint confidence would allow us to spot and deal with such situations accordingly. For example by not visualising the joint or by prompting the user to take some action.
On Mon, 23 Sep 2019 at 23:06, qm13 notifications@github.com wrote:
We released 0.9.3 last Friday. It includes a brand new ML model (v2), improved 3D model fitting, smoothing off by defaults, and improved performance. Love to hear from the community on whether this release improves your apps.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/514?email_source=notifications&email_token=AA23NYUDN3E3B3ITKEYM3CTQLE4XNA5CNFSM4IEQ5OKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7MNKQA#issuecomment-534304064, or mute the thread https://github.com/notifications/unsubscribe-auth/AA23NYUNEYN43KOH6QYN5H3QLE4XNANCNFSM4IEQ5OKA .
Same situation. I appreciate a better model of course but when fusing joint data from multiple kinects in a room I don't want to fuse this false joint data from joints that are occluded.
qm13, I compared V0.92 to V0.93 back-to-back with the bundled Body Tracking Viewer app, and the tracking latency (the time delay between reality and the output being shown on screen) for both versions is 0.25 seconds. This appears to be due almost entirely to a 0.2 second delay between the camera sensor (whether using RGB or depth) and transmission to the PC - it's as though the camera is buffering 5 or 6 frames internally, for no reason. For comparison, the Kinect For XBox One has a body tracking latency of around 0.12 seconds and half that for RGB and depth sensor data. For more modern comparisons, the Oculus Rift S's cameras have about a 0.02 second latency for room tracking, and some of the current Intel RealSense cameras boast a 0.006 second latency for similar tasks - that's not body tracking, but it shows how quickly the data can get from the camera sensor to a processed output.
I made this simple test procedure anyone can use to evaluate performance, most notably the latency and lag: 1: With the Kinect plugged in, start the bundled Azure Kinect Body Tracking Viewer app. Maximize the output window. 2: stand in front of the Kinect while facing the computer screen, ensure you can clearly see the screen. 3: pull out your phone, start a camera or videocamera app, and set it to the highest framerate available. 4: Tell your phone to begin recording, and point it at your computer screen so it can see the body tracking output. We'll assume you are holding the phone with your right hand. 5: Reach out with your left hand (or whichever hand isn't holding the phone), and ensure that you can clearly see your extended arm on the Body Tracking output, and can also see that hand on your phone's camera. 6: Suddenly and rapidly pull that extended hand downwards, while holding your phone camera steady. 7: repeat 5-6 a few times, then end the recording.
This gives you good, well-timed footage in which you can count the frames in your phone's video between your real movement and the Kinect's detection of that movement. My phone records at 60 fps, so there were 15 frames (0.25 seconds) between my movement and the Kinect's detection of the movement. I tried this again in the bundled Azure Kinect Viewer 1.2 with each the RGB and depth output, and those had delays of 0.20 seconds - better, but not by much. For a sanity check and as a 'control', I did the same test except filming the computer mouse and the cursor movement across the screen - that yielded a lag of about 1 frame at 60fps, or 0.016 seconds, so it confirmed that my monitor is sufficiently responsive and not the cause of the delay.
It seems that the problem may not be the tracking algorithm itself, but rather a very long processing time and delay for the camera data overall - assuming that the depth image output isn't being delayed to match the skeleton tracking. A delay that long is a non-starter for my application, as we aim for 0.05 seconds but can stretch a bit beyond 0.1 seconds if the tracking is perfect.
I performed these tests on a PC with an AMD 2700X CPU and GTX 1060 GPU. Even if the GPU is below the recommendation, that shouldn't affect the 0.2 second latency between the camera sensors and the software output.
@hmexx @RoseFlunder please see https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/734 re joint pose confidence.
@Chris45215 thank you for the excellent perf analysis. I have flagged your comment to the Sensor SDK team. You also may want to republish your analysis against the Sensor SDK.
v0.9.3 is significant improvement for body index map accuracy. There are still some inaccurately classified pixels, in the area of the fingers and around the body edges, but well done so far.
Regarding performance: I don't see major performance boost in comparison with v0.9.2, but maybe I'm missing something or it's due to my Nvidia GTX 1060. Indeed, the suggestion of @Chris45215 is great for measuring the overall performance. It should be applied to the Azure Kinect Viewer, as well.
The suggestion for pose confidence levels is also a significant improvement on its own. I hope you can add the hand joints and hand states soon, too. It's already work in progress, as far as I see, and this gives hope for further improvement.
One cosmetic request: Could you please modify a bit the Body Tracking Viewer in the next release, to start full screen instead of in a small window.
@qm13 - 0.9.3 seems to have been improved the tracking for the arms, but to me it actually looks a bit worse for the legs. In 0.9.2 and before the standing leg was accurate while the extended leg was snapped back to the standing one - in 0.9.3 it seems to be 'splitting the difference' and both legs are now shown venturing into in the empty area between them.
I have tested this both with pre-recorded MKV files in several directions and in the newly downloaded 3d viewer.
Please see an example in issue 738 where I redid the analysis of a previously recorded file with 0.9.3.
hey anyone knows how to record body tracking as .mkv file? Can we use k4arecorder to record it?
No, the mkv file from the k4a recorder doesn't contain body tracking information but it has the color, depth, IR tracks. So if you open the recording with the playback API from the SDK you can feed the captures into the body tracking as usual as you would do it when working with a real device.
Maybe you could check out if the recording api can add customs tracks. I havent used it yet
@yijiew @qm13 Thank you for the latest update v0.9.4 featuring hand & finger joints tracking, as well as the joint tracking states! These are all very helpful.
What I noticed recently is that the handtip joints track the fingers pretty well. There are some misses from time to time, but generally the fingers are well tracked. On the other side, the thumb joints don't track the thumbs so well, especially when the user rises his/her forearms. Is there any particular reason for this?
By the way, do you plan to add hand states (in means of open hand/closed hand) soon, as well?
Closing as performance issue being tracked in #816
Dear Support Team, As we have kinect AZURE DK bodytracking sensor but the performance of this sensor is low as compared to the kinect V2 on CPU mode. So we want the bodytracking sensor who’s working is similar with the kinect V2, Hence please advice and do the needful.
I’ve created a feature request to address the latency, inaccuracy and slow speed of Azure Kinect body tracking by implementing the Kinect 2 ‘random forest’ model as an option.. pls upvote:
cc: @Chris45215 @PierrePlantard @vpenades @JMonkey94 @RoseFlunder @pratikgd
What is the technical difference (AI features) between Azure Kinect and Kinect V2's body tracking except the Azure Kinect uses a DNN model to derive the skeleton, the Kinect v2 uses a ‘random forest’ model?
After patiently waiting for months, I've come to this conclusion, as one final request. https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/40188511-please-replace-the-body-tracking-sdk-team
The Body Tracking SDK is entirely separate from the Sensor SDK. We recognize that there are performance issues in both SDKs. Using a single E2E experience whilst demonstrating E2E performance issue it makes it hard to isolate the bottle necks. Can we please use this issue to focus on body tracking performance - the time taken from enqueuing a capture to popping a result. This should take no more than ~30ms on the recommended Body Tracking minimum hardware for 30fps. We also realize this is a high hardware bar and are working on a lite DNN model. We will use #816 to track sensor performance.
Happy to see this thread reopened, and that's great that you are working on a lite DNN model.
Right now these are the main issues with body tracking performance (I can open up a separate ticket for each one but they all seem to fit this thread title):
1) FPS 2) Latency 3) Cross-platform compatibility 4) Resource hogging (GPU) 5) Accuracy (dynamic movement) 6) Power efficiency
Concerned that a lite DNN model will only solve half of these issues at best.
Would porting over a random forest option hinder the development of the DNN model?
I know I'm sounding like a broken record here but I think it would solve all of the above, and would let you guys develop the DNN model in peace. It's also quite popular on the feedback forum.
EDIT: I know the team internally debated random forest vs DNN and chose the latter - what I'm talking about is an "option" to use random forest, not a replacement. I'm also assuming a minimal port would take about an afternoon for a good developer, could be wrong about that so feel free to correct me.
@fractalfantasy - Thank you for breaking down the issues. Obviously, every use case is different. However, I would think accuracy would likely be the most important, because the others can generally be solved, if in sub-optimal ways. For instance, resource utilization can be solved by adding more resources. For accuracy, adding computing power only delivers incorrect results more quickly. Power efficiency only adds a slight cost to the end user - accuracy completely halts a project.
@fractalfantasy - Thank you for breaking down the issues.
@billpottle no probs and thanks for following up :)
Yes I agree adding more resources would likely solve low FPS and would help with resource hogging but it would not necessarily solve latency and cross-platform compatibility.
Regarding power efficiency: Azure Kinect is designed for industrial applications most of which use small form-factor computers, like Arduino or Nvidia Jetson... so you'll need body tracking to work efficiently on those systems. Also seems counter-intuitive and environmentally unethical to have a bunch GTX2080's running full-tilt in a warehouse all day.
We are developing an app that uses the Kinect as a musical instrument and visual generator so latency is an absolute deal-breaker, and the GPU needs to be kept free for visual processing. The old Kinect v2 body tracking was perfect for us - still no one has answered on how porting a legacy Kinect v2 body tracking option would hurt development of the DNN model?
We plan on releasing our app to the public, so keeping the hardware requirements inclusive is important. If you look at the big picture keeping hardware requirements low on k4a apps will result in widespread adoption leading to more sales for Azure Kinect.
@billpottle You're mistaken on the accuracy/computing power ratio.
As stated by some users, high end computers with a lot of power are simply not an option for many business models, including ours. In many cases, it's hardware cost what halts the project.
Once you accept the only valid platform is a low end computer you have a choice between:
And by "low quality" I mean no less than what the Kinect2 already delivers.
Now, why would someone choose "low quality" at high framerates? Because for Natural User Interfaces, it is CRITICAL, to be able to pick fast movements or gestures, even if some limb is lost sometimes. Because with a High Quality tracker at low framerate, you miss ALL the fast movements, which actually results in much worse overall tracking quality.
Any decent body tracker to be used for NUI should run at 30fps, regardless of the processing power available.
@qm13 @wes-b specifically how would porting kinect 2 body tracking over as a legacy OPTION hurt development of the DNN model?
I beg your pardon in advance for the long and somewhat OT post. The current Body Tracking SDK is almost unusable on most of the devices where Kinect V2 was running smoothly (*). The result of this, especially in these Covid-19 times, is that we almost cannot use the new Kinect for Azure for all those applications home-based that are currently desperately needed to keep people active and in shape. Let me humbly suggest to the K4A development team to consider opening a development branch of the previous SDK for Kinect v2 to be used with the new Kinect for Azure, of course paying the price of a less accurate tracking. I would say that for most of the interactive applications was already fairly good (we are ahppily using Kinect v2 + old sdk in a rehab center for children). Other solutions (i.e. based on Intel Realsense D425 or D415), much simpler and less effective from many points of view, are already in place, and if there will be no changes in the current Kinect for Azure SDK development plans regards the use of GPU-intesive DNN for the BT, maybe many will switch to those ones from the Kinect.
@fractalfantasy @billpottle @vpenades @dotslinker I love the passion and the Azure Kinect team wants to address your issues. The team was recently reorganized in to the new AI Platform team within Azure. This and COVID-19 has somewhat impacted productivity. Please bear with us. We want to make it possible for you to bring your applications to life with Azure Kinect. Here are responses to your questions/points (hopefully I have not missed some). Starting with @fractalfantasy excellent breakdown of the overall problem:
Describe the bug
By all means Azure Kinect is the best Kinect so far, and will be probably the best depth sensor on the market. The sensor SDK is pretty stable and good, providing almost everything an average user would want. But the body tracking subsystem is ruining this positive user experience. In means of API this SDK is great too, but the DNN model performance is much worse than the body tracking of Kinect-v2. The joint positions are inaccurate by fast movements. The body index map is not very accurate, as well. It does not fully match the user's silhouette on the depth frame. On my GTX 1060 it takes 2-3 depth frame cycles to process a body frame. Hence, it works at about 10 fps.
To Reproduce
Expected behavior
Please consider at least providing some option to the users, who don't have high end graphics cards and would like to get Kinect body tracking out of the box, without (or with minimum) extra installations. As far as I remember, Kinect-v2 used random forest model(s) for its body tracking. The performance was great and no extra installations were needed, back then in 2013/14.
Logs
Screenshots
Desktop (please complete the following information):
Additional context
I believe most Kinect users would expect better, more accurate and more performant body tracking experience, not worse. And now, with Apple adding people segmentation and body tracking to their AR-Kit 3.0 I would expect Kinect (with all these years of experience) to provide a better user experience in all aspects than anybody else.