opentrack / opentrack

Head tracking software for MS Windows, Linux, and Apple OSX
3.67k stars 452 forks source link

Easy Tracker for Kinect and other devices #915

Closed Slion closed 5 years ago

Slion commented 5 years ago

The focus of this issue changed from trying to make the original Point Tracker work with Kinect to implementing a new generic Easy Tracker primarily designed for Kinect.

New Idea

We are implementing Easy Tracker using OpenCV cv::solveP3P to estimate our pose. Later on we may consider implementing support for several pose estimation solutions. For instance Easy Tracker could support 3 points model tracking and OpenCV face tracking. Branch: opencv-point-tracker

We are currently using the original Point Tracker solution to extract points from the image. At some point we ought to replace it with something like that Ball Tracking using Kalman filter. I reckon that should solve our filtering issues. Filtering on the frame is certainly the best way to go about it, albeit probably not the cheapest one in terms of CPU usage.

Original Idea

Point Tracker fails to recover after a spell with a lot of noise in your IR frame buffer. It's typically the case when being too close from your Kinect for instance. An easy workaround for such issues is to implement an auto reset feature.

It could be done by adding an "auto reset" checkbox in our settings and possibly some user define parameter such as:

Should we somehow check the mapping settings for those parameters rather than adding new ones?

sthalik commented 5 years ago

I'd rather avoid the checkbox and do something sensible when dynamic pose is used. The parameters can be conservative, e.g. over 80 pitch, 60 yaw, or 50 roll.

sthalik commented 5 years ago

You can also reset if non-absolute value of TZ is less than 10.

Slion commented 5 years ago

I'm afraid the workaround won't do. For some reason last night I could use opentrack with the workaround but today I get so much rubbish from the tracker it's not manageable. It keeps happening even though the frames are perfect. We will have to get to the bottom of it. It makes me wonder how anyone can use it in that cap configuration. Looks like I'm gonna have to save a weekend to redo the maths. image

@sthalik Any chance you could do a release that I could try, to confirm this is not an issue with my build?

sthalik commented 5 years ago

Don't you have dynamic pose disabled in that particular case? It needs to be always enabled for caps.

I can do a release, just let's get that out of the way.

Slion commented 5 years ago

Don't you have dynamic pose disabled in that particular case?

It was enabled.

sthalik commented 5 years ago

The points seem to be detected correctly. Note that the cyan circle outlines are in the right places. Can you confirm with a webcam? And if your image is bad, increase "dynamic pose timeout".

Slion commented 5 years ago

The points seem to be detected correctly.

It seems so. The issue is not with the point extraction but with the pose computation itself.

Can you confirm with a webcam?

You mean confirm the points extraction is correct? I pretty much know it is. I can change the scaling during bitmap transformation so that you can see a proper image in black and white and not just the IR reflectors.

And if your image is bad, increase "dynamic pose timeout"

I guess I could try change that parameter.

Slion commented 5 years ago

I'm under the impression the problem occurs only past a certain FOV. The Kinect IR camera reports a diagonal field of view of 89,5. So I was setting it to 89. If I set it to 80 the problem still occurs. At 75 it still occurs. However at 70 it looks like I can't reproduce the issue anymore. Somehow the tracker gets messed up if the FOV is too high. Even more interesting, if you set the FOV to 90 the problem kicks in spontaneously a couple of seconds after resetting/centring. Though if I'm further away from the camera, even at FOV 90 the problem does not kick in. It is therefore a combination of FOV and model distance from the camera that triggers the issue. That would also explain why I was able to play ED for over an hour with an FOV of 89 without much issue. In fact when playing ED I'm further away from the Kinect than when developing.

A side effect of lowering the FOV is that our maths are a little off. It's very noticeable when looking at the translation vector specifically the Z coordinate for instance. Nevertheless it looks like this could be a good workaround until we fix our maths.

Could it all be caused by a loss of precision somewhere?

Slion commented 5 years ago

Alright the problem is with PointTracker::POSIT choosing the wrong solution from the two it's computing.

Could it all be caused by a loss of precision somewhere?

It certainly looks like it. When computing the solutions deviations we cast them from double to float before comparing them, by idea!

Slion commented 5 years ago

It's possibly better without the cast to float but you can still have situations where the wrong solution is picked.

Slion commented 5 years ago

After more testing I doubt switching from float to double brings anything. I could pretty much workaround that wrong pose issue by adding X_CM_expected = {}; at the beginning of PointTracker::POSIT however even then the pose is somewhat wrong. It keeps adding pitch and roll when you only turn your head left and right. You can compensate with mapping but that's far from ideal.

I'm afraid no matter what I do the algorithm implemented by Point Tracker is not exactly Kinect friendly. The best way forward is probably to branch off Point Tracker into a new module and implement tracking and pose estimation using OpenCV APIs. See cv::solveP3P. See ICoordinateMapper::GetDepthCameraIntrinsics from the Kinect API. It provides inputs for OpenCV camera matrix as well as some distortion coefficients. objectPoints and imagePoints parameters must contain the points in the same order.

sthalik commented 5 years ago

After more testing I doubt switching from float to double brings anything.

Are your cap dimensions standard? Can you see if pitching the actual camera up or down helps any?

sthalik commented 5 years ago

The best way forward is probably to branch off Point Tracker into a new module and implement tracking and pose estimation using OpenCV APIs. See cv::solveP3P

Sorry, no. That solver is very unstable, tried it already for the Aruco tracker. Can you check opencv's modules/src/calib3d.cpp whether it still requires 4 points?

Slion commented 5 years ago

Are your cap dimensions standard?

Yes it's a genuine TrackIR TrackClip whose dimensions I believe are matching the default ones from Point Tracker cap.

Can you see if pitching the actual camera up or down helps any?

It does not change the results so much.

Slion commented 5 years ago

Sorry, no. That solver is very unstable, tried it already for the Aruco tracker.

That's odd. I mean this is the most basic computer vision problem. It would be a shame if OpenCV did not do a good job at it. Anyway, I still think it's worth a try.

sthalik commented 5 years ago

I think you may be returning wrongly-scaled coordinates for X and Y. Either the aspect ratio is wrong for X/Y image coords from Kinect, or PT doesn't deal well with non-4/3 aspect ratios. For the latter possibility try a bilinear scaler from opencv.

The other solvers are shitty if you read them.

sthalik commented 5 years ago

Basically one other solver computes the solution numerically in 3 different ways and uses the one with the least reprojection error.

sthalik commented 5 years ago

@Slion now that I thought about it more, you'd need to crop the frame to 4:3 :(

Slion commented 5 years ago

Yes, it is possibly making a bunch of assumptions about the camera specs that don't work so well with Kinect.

sthalik commented 5 years ago

The issue is that all variants of POSIT assume the same focal length for both X and Y coordinates.

Slion commented 5 years ago

I hacked it together in that opencv-point-tracker branch and it is working well enough so far. Much better it seems than what I could get with the original point tracker.

@sthalik Don't bother too much with code review right now as this is just a proof of concept really. Major refactoring and clean-up coming up. The one place where I need your feedback is where I modified the camera API so that information can be obtained by the tracker from the camera. That was needed for the tracker to fetch camera intrinsic.

Slion commented 5 years ago

@sthalik That Easy Tracker is slowly getting there and I would like to contribute it to our unstable branch once you are happy with it.

Currently it's an OpenCV 3 points tracker. Facts:

TODOs:

sthalik commented 5 years ago

There's too much copy-pasted code. Is there anything in particular that you need to be done in a different manner, other than using cv::solvePnP? Changing the pose estimation method is self-contained and short enough that PT itself can do that. Things like handling particular channel amount or element type can be added to PT as well.

I've run into problems with using contours (had it implemented in PT for a release or two). There are edge cases that don't work in your implementation. It's more noisy in general and sometimes goes totally out of whack.

Finally I just can't have that much copy-pasted code in-tree. Can we arrive at something that's more maintainable?

sthalik commented 5 years ago

Support OpenCV face tracking via settings options.

If you're going for it with cascade classifiers it's not gonna end well. Been there, done that. Better look at CLandmark. Even the old "flandmark" library was fast and pretty damn accurate. I made a mistake of not using keypoints but the basic idea is there: make a 3d face mesh and project given coordinate to get the Z value.

Support 3 points colour tracking using HSV and key colour from settings.

That's doable but are there any users? Is there any solid advantage compared to existing cap tracking? I've had people asking for single-point tracking but not really for colored points.

Slion commented 5 years ago

There's too much copy-pasted code.

Only settings and UI remain much the same as Point Tracker and even those will eventually evolve. Everything else has been rewritten. I have much simplified the architecture getting rid of the various frame and camera objects Point Tracker was using, making it in fact a lot easier to maintain than Point Tracker.

However it is true that in theory you could better the architecture of Point Tracker to get to the same results but that would be a lot more complicated as you would need to keep the existing feature set working, so branching was my safest bet. Keep in mind that I don't even have any hardware I could test Point Tracker with.

Let's forget about possible future evolution for now as it seems you don't even want Easy Tracker as it is.

Slion commented 5 years ago

I've run into problems with using contours

Well it works just great here. I had a long testing session on MWO and that was rather flawless. Kinect + Easy Tracker + Accela filter. Though I had to max out both Accela smoothing and deadzone, we ought to provide more range for those settings slider.

You also mentioned you had problems with cv::solveP3P. As implemented in Easy Tracker, together with Kinect, and I'm assuming with any IR camera providing proper intrinsic, it works extremely well.

image

sthalik commented 5 years ago

You also mentioned you had problems with cv::solveP3P

Since there's a new solver I should try it again.

I'm assuming with any IR camera providing proper intrinsic

Please check it with clips and regular webcams as well.

Slion commented 5 years ago

Please check it with clips and regular webcams as well.

As mentioned above, clips and custom models are not currently supported, only caps for now. Clips support should be straight forward to implement. If someone is interested, and willing to test, I could try implementing it blind. I don't own a clip. As for regular webcams, I can use the colour buffer from my Kinect but it won't be able to solve anything as it needs the camera intrinsics and more logic in the point extractor to be able to track a specified colour. Currently camera intrinsics can only be provided by implementing your own video::impl::camera. Though it should be easy to add fields in the settings dialog for users to provide camera intrinsics themselves.

sthalik commented 5 years ago

Who will maintain your tracker when you're no longer active in your project?

if someone is interested, and willing to test, I could try implementing it blind

From my experience that doesn't work. Ask people for camera captures. Just make them compress them. I got a ton of 500 mb videos taking just a few seconds.

it won't be able to solve anything as it needs the camera intrinsics

You can add something returning std::tuple<bool, intrinsics> to the camera impl.

it should be easy to add fields in the settings dialog for users to provide camera intrinsics themselves

Just don't :(

Overall the intrinsics are easily derived from the FOV. The distortion is so low on the PS3 Eye there's no point storing it.

Slion commented 5 years ago

Who will maintain your tracker when you're no longer active in your project?

If maintaining a tracker is too time consuming and the owner is not reachable to do it himself feel free to drop it.

From my experience that doesn't work.

It's rare but it does happen.

You can add something returning std::tuple<bool, intrinsics> to the camera impl.

In the current implementation they live in video::impl::camera::info.

Just don't :(

Well, I'm not crazy about it either, but why not. It would enable supporting any camera without changing the code.

Overall the intrinsics are easily derived from the FOV.

Are they now? I have not looked into it but I'm guessing that if all you needed was FOV then nobody would have come up with the notion of intrinsics. However I'm pretty sure you can make some educated guess about intrinsic if you have both vertical and horizontal FOV.

The distortion is so low on the PS3 Eye there's no point storing it.

Distortion on most camera probably won't matter much for our use case where user is usually very central. Still, it's nice to have.

Slion commented 5 years ago

@sthalik Thanks for the code review. Maybe we should do it on the pull request though as I'm concerned we are sometimes reviewing code that has already been changed.

sthalik commented 5 years ago

Good idea. I was having trouble commenting on the changes as well.

Slion commented 5 years ago

I just gave it a try with ED and it's really awesome. Though I've had to change the range of Accela settings. Here is what I was using: image

sthalik commented 5 years ago

That's a boatload of smoothing. I normally use .55 with a .1 deadzone with PT.

Slion commented 5 years ago

That's a boatload of smoothing.

Compared to your setting indeed, I'm not sure why is that. It was the same issue with Kinect face tracker. Though 5 degrees and 1 degree do not feel wrong at all. There is also no way PT has a precision of 0.1 degree. Then again I have no idea how deadzone and smoothing are used in Accela.

I normally use .55 with a .1 deadzone with PT.

Which hardware?

Slion commented 5 years ago

I just realised the default cap dimensions did not exactly match the ones from my cap. cap_x is more like 35mm instead of 40. cap_y is more like 55mm instead of 60. cap_z is correct with 100mm.

sthalik commented 5 years ago

Does this help any when tracking, though?

sthalik commented 5 years ago

Which hardware?

An IR clip with flat-shaved LEDs, fov of 56 and 10 px radius for each blob on-screen.

Slion commented 5 years ago

An IR clip with flat-shaved LEDs, fov of 56 and 10 px radius for each blob on-screen.

Using a PS3 Eye?

Slion commented 5 years ago

Does this help any when tracking, though?

I could swear it's a little more stable, hard to tell for sure until more testing is done. The one obvious benefit is that it does provide a more accurate Z offset. Also, I've had a minor issue, when facing straight and pitching down extremely the solver would return some yaw around -20 degrees. Now with better model specs that issue is almost gone too.

Slion commented 5 years ago

My solution is noisier and needs more filtering, both with PT and Easy tracker, probably because of the lower resolution. My IR frame is only 512 by 424 against 640 by 480 for the PS3 Eye. That's the only reason I can think of.

I'm afraid I'll have to leave it at that until I get hold of an Azure Kinect which comes with an IR frame of 1024 by 1024.

There are a bunch of things that could be attempted to improve our precision without using active markers. Maybe trying to track the passive markers in the RGB frame but that probably won't work well in low light. Certainly not worth the effort.

JimmyRaoUF commented 5 years ago

Please check it with clips and regular webcams as well.

As mentioned above, clips and custom models are not currently supported, only caps for now. Clips support should be straight forward to implement. If someone is interested, and willing to test, I could try implementing it blind. I don't own a clip. As for regular webcams, I can use the colour buffer from my Kinect but it won't be able to solve anything as it needs the camera intrinsics and more logic in the point extractor to be able to track a specified colour. Currently camera intrinsics can only be provided by implementing your own video::impl::camera. Though it should be easy to add fields in the settings dialog for users to provide camera intrinsics themselves.

@Slion Hi Slion, I have built a custom clip and would like to test it, can you try implementing that and share the built program? By the way, what's the kernel difference between a cap and a clip if they are both 3 points and using reflective marker?

Slion commented 5 years ago

The difference between clip and cap is in the vertices layout.