Closed Slion closed 5 years ago
I'd rather avoid the checkbox and do something sensible when dynamic pose is used. The parameters can be conservative, e.g. over 80 pitch, 60 yaw, or 50 roll.
You can also reset if non-absolute value of TZ
is less than 10.
I'm afraid the workaround won't do. For some reason last night I could use opentrack with the workaround but today I get so much rubbish from the tracker it's not manageable. It keeps happening even though the frames are perfect. We will have to get to the bottom of it. It makes me wonder how anyone can use it in that cap configuration. Looks like I'm gonna have to save a weekend to redo the maths.
@sthalik Any chance you could do a release that I could try, to confirm this is not an issue with my build?
Don't you have dynamic pose disabled in that particular case? It needs to be always enabled for caps.
I can do a release, just let's get that out of the way.
Don't you have dynamic pose disabled in that particular case?
It was enabled.
The points seem to be detected correctly. Note that the cyan circle outlines are in the right places. Can you confirm with a webcam? And if your image is bad, increase "dynamic pose timeout".
The points seem to be detected correctly.
It seems so. The issue is not with the point extraction but with the pose computation itself.
Can you confirm with a webcam?
You mean confirm the points extraction is correct? I pretty much know it is. I can change the scaling during bitmap transformation so that you can see a proper image in black and white and not just the IR reflectors.
And if your image is bad, increase "dynamic pose timeout"
I guess I could try change that parameter.
I'm under the impression the problem occurs only past a certain FOV. The Kinect IR camera reports a diagonal field of view of 89,5. So I was setting it to 89. If I set it to 80 the problem still occurs. At 75 it still occurs. However at 70 it looks like I can't reproduce the issue anymore. Somehow the tracker gets messed up if the FOV is too high. Even more interesting, if you set the FOV to 90 the problem kicks in spontaneously a couple of seconds after resetting/centring. Though if I'm further away from the camera, even at FOV 90 the problem does not kick in. It is therefore a combination of FOV and model distance from the camera that triggers the issue. That would also explain why I was able to play ED for over an hour with an FOV of 89 without much issue. In fact when playing ED I'm further away from the Kinect than when developing.
A side effect of lowering the FOV is that our maths are a little off. It's very noticeable when looking at the translation vector specifically the Z coordinate for instance. Nevertheless it looks like this could be a good workaround until we fix our maths.
Could it all be caused by a loss of precision somewhere?
Alright the problem is with PointTracker::POSIT
choosing the wrong solution from the two it's computing.
Could it all be caused by a loss of precision somewhere?
It certainly looks like it. When computing the solutions deviations we cast them from double to float before comparing them, by idea!
It's possibly better without the cast to float but you can still have situations where the wrong solution is picked.
After more testing I doubt switching from float to double brings anything. I could pretty much workaround that wrong pose issue by adding X_CM_expected = {};
at the beginning of PointTracker::POSIT
however even then the pose is somewhat wrong. It keeps adding pitch and roll when you only turn your head left and right. You can compensate with mapping but that's far from ideal.
I'm afraid no matter what I do the algorithm implemented by Point Tracker is not exactly Kinect friendly.
The best way forward is probably to branch off Point Tracker into a new module and implement tracking and pose estimation using OpenCV APIs. See cv::solveP3P
. See ICoordinateMapper::GetDepthCameraIntrinsics
from the Kinect API. It provides inputs for OpenCV camera matrix as well as some distortion coefficients. objectPoints
and imagePoints
parameters must contain the points in the same order.
After more testing I doubt switching from float to double brings anything.
Are your cap dimensions standard? Can you see if pitching the actual camera up or down helps any?
The best way forward is probably to branch off Point Tracker into a new module and implement tracking and pose estimation using OpenCV APIs. See cv::solveP3P
Sorry, no. That solver is very unstable, tried it already for the Aruco tracker. Can you check opencv's modules/src/calib3d.cpp
whether it still requires 4 points?
Are your cap dimensions standard?
Yes it's a genuine TrackIR TrackClip whose dimensions I believe are matching the default ones from Point Tracker cap.
Can you see if pitching the actual camera up or down helps any?
It does not change the results so much.
Sorry, no. That solver is very unstable, tried it already for the Aruco tracker.
That's odd. I mean this is the most basic computer vision problem. It would be a shame if OpenCV did not do a good job at it. Anyway, I still think it's worth a try.
I think you may be returning wrongly-scaled coordinates for X and Y. Either the aspect ratio is wrong for X/Y image coords from Kinect, or PT doesn't deal well with non-4/3 aspect ratios. For the latter possibility try a bilinear scaler from opencv.
The other solvers are shitty if you read them.
Basically one other solver computes the solution numerically in 3 different ways and uses the one with the least reprojection error.
@Slion now that I thought about it more, you'd need to crop the frame to 4:3 :(
Yes, it is possibly making a bunch of assumptions about the camera specs that don't work so well with Kinect.
The issue is that all variants of POSIT assume the same focal length for both X and Y coordinates.
I hacked it together in that opencv-point-tracker branch and it is working well enough so far. Much better it seems than what I could get with the original point tracker.
@sthalik Don't bother too much with code review right now as this is just a proof of concept really. Major refactoring and clean-up coming up. The one place where I need your feedback is where I modified the camera API so that information can be obtained by the tracker from the camera. That was needed for the tracker to fetch camera intrinsic.
@sthalik That Easy Tracker is slowly getting there and I would like to contribute it to our unstable branch once you are happy with it.
Currently it's an OpenCV 3 points tracker. Facts:
TODOs:
There's too much copy-pasted code. Is there anything in particular that you need to be done in a different manner, other than using cv::solvePnP
? Changing the pose estimation method is self-contained and short enough that PT itself can do that. Things like handling particular channel amount or element type can be added to PT as well.
I've run into problems with using contours (had it implemented in PT for a release or two). There are edge cases that don't work in your implementation. It's more noisy in general and sometimes goes totally out of whack.
Finally I just can't have that much copy-pasted code in-tree. Can we arrive at something that's more maintainable?
Support OpenCV face tracking via settings options.
If you're going for it with cascade classifiers it's not gonna end well. Been there, done that. Better look at CLandmark. Even the old "flandmark" library was fast and pretty damn accurate. I made a mistake of not using keypoints but the basic idea is there: make a 3d face mesh and project given coordinate to get the Z value.
Support 3 points colour tracking using HSV and key colour from settings.
That's doable but are there any users? Is there any solid advantage compared to existing cap tracking? I've had people asking for single-point tracking but not really for colored points.
There's too much copy-pasted code.
Only settings and UI remain much the same as Point Tracker and even those will eventually evolve. Everything else has been rewritten. I have much simplified the architecture getting rid of the various frame and camera objects Point Tracker was using, making it in fact a lot easier to maintain than Point Tracker.
However it is true that in theory you could better the architecture of Point Tracker to get to the same results but that would be a lot more complicated as you would need to keep the existing feature set working, so branching was my safest bet. Keep in mind that I don't even have any hardware I could test Point Tracker with.
Let's forget about possible future evolution for now as it seems you don't even want Easy Tracker as it is.
I've run into problems with using contours
Well it works just great here. I had a long testing session on MWO and that was rather flawless. Kinect + Easy Tracker + Accela filter. Though I had to max out both Accela smoothing and deadzone, we ought to provide more range for those settings slider.
You also mentioned you had problems with cv::solveP3P
. As implemented in Easy Tracker, together with Kinect, and I'm assuming with any IR camera providing proper intrinsic, it works extremely well.
You also mentioned you had problems with cv::solveP3P
Since there's a new solver I should try it again.
I'm assuming with any IR camera providing proper intrinsic
Please check it with clips and regular webcams as well.
Please check it with clips and regular webcams as well.
As mentioned above, clips and custom models are not currently supported, only caps for now.
Clips support should be straight forward to implement. If someone is interested, and willing to test, I could try implementing it blind. I don't own a clip.
As for regular webcams, I can use the colour buffer from my Kinect but it won't be able to solve anything as it needs the camera intrinsics and more logic in the point extractor to be able to track a specified colour. Currently camera intrinsics can only be provided by implementing your own video::impl::camera
. Though it should be easy to add fields in the settings dialog for users to provide camera intrinsics themselves.
Who will maintain your tracker when you're no longer active in your project?
if someone is interested, and willing to test, I could try implementing it blind
From my experience that doesn't work. Ask people for camera captures. Just make them compress them. I got a ton of 500 mb videos taking just a few seconds.
it won't be able to solve anything as it needs the camera intrinsics
You can add something returning std::tuple<bool, intrinsics>
to the camera impl.
it should be easy to add fields in the settings dialog for users to provide camera intrinsics themselves
Just don't :(
Overall the intrinsics are easily derived from the FOV. The distortion is so low on the PS3 Eye there's no point storing it.
Who will maintain your tracker when you're no longer active in your project?
If maintaining a tracker is too time consuming and the owner is not reachable to do it himself feel free to drop it.
From my experience that doesn't work.
It's rare but it does happen.
You can add something returning std::tuple<bool, intrinsics> to the camera impl.
In the current implementation they live in video::impl::camera::info
.
Just don't :(
Well, I'm not crazy about it either, but why not. It would enable supporting any camera without changing the code.
Overall the intrinsics are easily derived from the FOV.
Are they now? I have not looked into it but I'm guessing that if all you needed was FOV then nobody would have come up with the notion of intrinsics. However I'm pretty sure you can make some educated guess about intrinsic if you have both vertical and horizontal FOV.
The distortion is so low on the PS3 Eye there's no point storing it.
Distortion on most camera probably won't matter much for our use case where user is usually very central. Still, it's nice to have.
@sthalik Thanks for the code review. Maybe we should do it on the pull request though as I'm concerned we are sometimes reviewing code that has already been changed.
Good idea. I was having trouble commenting on the changes as well.
I just gave it a try with ED and it's really awesome. Though I've had to change the range of Accela settings. Here is what I was using:
That's a boatload of smoothing. I normally use .55
with a .1
deadzone with PT.
That's a boatload of smoothing.
Compared to your setting indeed, I'm not sure why is that. It was the same issue with Kinect face tracker. Though 5 degrees and 1 degree do not feel wrong at all. There is also no way PT has a precision of 0.1 degree. Then again I have no idea how deadzone and smoothing are used in Accela.
I normally use .55 with a .1 deadzone with PT.
Which hardware?
I just realised the default cap dimensions did not exactly match the ones from my cap.
cap_x
is more like 35mm instead of 40.
cap_y
is more like 55mm instead of 60.
cap_z
is correct with 100mm.
Does this help any when tracking, though?
Which hardware?
An IR clip with flat-shaved LEDs, fov of 56 and 10 px radius for each blob on-screen.
An IR clip with flat-shaved LEDs, fov of 56 and 10 px radius for each blob on-screen.
Using a PS3 Eye?
Does this help any when tracking, though?
I could swear it's a little more stable, hard to tell for sure until more testing is done. The one obvious benefit is that it does provide a more accurate Z offset. Also, I've had a minor issue, when facing straight and pitching down extremely the solver would return some yaw around -20 degrees. Now with better model specs that issue is almost gone too.
My solution is noisier and needs more filtering, both with PT and Easy tracker, probably because of the lower resolution. My IR frame is only 512 by 424 against 640 by 480 for the PS3 Eye. That's the only reason I can think of.
I'm afraid I'll have to leave it at that until I get hold of an Azure Kinect which comes with an IR frame of 1024 by 1024.
There are a bunch of things that could be attempted to improve our precision without using active markers. Maybe trying to track the passive markers in the RGB frame but that probably won't work well in low light. Certainly not worth the effort.
Please check it with clips and regular webcams as well.
As mentioned above, clips and custom models are not currently supported, only caps for now. Clips support should be straight forward to implement. If someone is interested, and willing to test, I could try implementing it blind. I don't own a clip. As for regular webcams, I can use the colour buffer from my Kinect but it won't be able to solve anything as it needs the camera intrinsics and more logic in the point extractor to be able to track a specified colour. Currently camera intrinsics can only be provided by implementing your own
video::impl::camera
. Though it should be easy to add fields in the settings dialog for users to provide camera intrinsics themselves.
@Slion Hi Slion, I have built a custom clip and would like to test it, can you try implementing that and share the built program? By the way, what's the kernel difference between a cap and a clip if they are both 3 points and using reflective marker?
The difference between clip and cap is in the vertices layout.
The focus of this issue changed from trying to make the original Point Tracker work with Kinect to implementing a new generic Easy Tracker primarily designed for Kinect.
New Idea
We are implementing Easy Tracker using OpenCV cv::solveP3P to estimate our pose. Later on we may consider implementing support for several pose estimation solutions. For instance Easy Tracker could support 3 points model tracking and OpenCV face tracking. Branch: opencv-point-tracker
We are currently using the original Point Tracker solution to extract points from the image. At some point we ought to replace it with something like that Ball Tracking using Kalman filter. I reckon that should solve our filtering issues. Filtering on the frame is certainly the best way to go about it, albeit probably not the cheapest one in terms of CPU usage.
Original Idea
Point Tracker fails to recover after a spell with a lot of noise in your IR frame buffer. It's typically the case when being too close from your Kinect for instance. An easy workaround for such issues is to implement an auto reset feature.
It could be done by adding an "auto reset" checkbox in our settings and possibly some user define parameter such as:
Should we somehow check the mapping settings for those parameters rather than adding new ones?