psmoveservice / PSMoveService

A background service that communicates with the psmove and stores pose and button data.
Apache License 2.0
593 stars 146 forks source link

tracker & controller need information from each other to calculate most recent pose. #12

Closed cboulay closed 8 years ago

cboulay commented 8 years ago

Currently the device managers call .update() and that's it. However, to calculate pose, the tracker(s) needs to know the controller colours from the controllers, and the controllers need to know the latest position from the tracker(s). So we need to replace the device manager .update() with 3 steps, and change the order around a little.

void DeviceManager::update()
{
    m_controller_manager.update();
    m_tracker_manager.update();
    m_hmd_manager.update();
}

becomes

void DeviceManager::update()
{
    m_tracker_manager.poll(); // Update tracker count and poll video frames
    m_controller_manager.poll(); // Update controller counts and poll button/IMU state
    m_hmd_manager.poll(); // Update tracker count

    m_tracker_manager.updateStateAndPredict(); // Get controller colors and update tracking blob positions/predictions
    m_controller_manager.updateStateAndPredict(); // Compute pose/prediction of tracking blob+IMU state
    m_hmd_manager.updateStateAndPredict(); // Get the pose + prediction for HMD

    m_tracker_manager.publish(); // publish tracker state to any listening clients (probably only used by ConfigTool)
    m_controller_manager.publish(); // publish controller state to any listening clients  (common case)
    m_hmd_manager.publish(); // publish hmd state to any listening clients (probably only used by ConfigTool)
}

The device managers still need information from each other. Maybe they can do this via the DeviceManager singleton (i.e., DeviceManager::getInstance()->getControllers) or we can pass in context arguments to the updateStateAndPredict functions.

cboulay commented 8 years ago

I've been thinking on the optical tracker algorithms a bit. My best approximation of what I think Oliver Kreylos is doing can be found in the wiki.

I'd like to test this out, but I'm getting a bit ahead. There's a bit of work to be done to PSMoveService first.

@HipsterSloth What's up next for pose estimation? I'd like to work on this a little but I want to do so in a way that doesn't step on anything you're working on.

  1. Give the tracker a list of tracked_object_infos, where each entry contains -colour of object R,G,B -shape of object (for now, only a sphere of radius 2.25 cm)
  2. Get 3D position of object -For each object
    • For each camera
      • Get the ROI if we are using ROIs
      • Filter / mask the ROI for object colour
      • Get the contour
      • Use one of the tracking algorithms.
      • If only one camera, we'll use a 3D algorithm.
      • If multiple cameras, use a 2D algorithm (center of mass?)
    • If multiple cameras, combine 2D positions to get 3D position.
  3. Controller updates its pose state estimation.
    • How will the controller get its 3D position? i.e., does the controller have to know about its tracker?
HipsterSloth commented 8 years ago

@cboulay After I fixed this ps3eye crash issue I wanted to port over the tracker pose estimation tool I wrote for psmove-unity5 since that is a pre-req for multi-camera tracking. That tool won't work fully until we have the controller tracking position (x, y, r) for each tracker getting computed, but I can get the tracker data stream interface setup that will provide that data when requested. Once the blob tracking is implemented we can plug it in on the server side.

As for the question of how the controller get's it's 3D position, I was thinking the update might operate like this:

cboulay commented 8 years ago

While my older 1-cam 3D algorithm does capture blob x,y,r (in pixels) as an intermediate step, it also requires another ellipse parameter. The x,y,r table is insufficient here. The newer 1-cam 3D algorithm does not use the blob centroid or radius at all; it requires the coordinates of each point in the contour. If we want 2D position for other applications then I can add that in because it is not too processor intensive, but it's still useless for 3D position estimation using the second algorithm. The x,y,r table is insufficient here too.

So, maybe they need to keep the full contour. Then it's up to the ServerControllerView to do the position estimation, choosing the algorithm depending on how many contours it finds. This also has the added advantage that the controller will know its shape. But the controller doesn't know the camera parameters. Maybe the controller can ask the contour-owning tracker to do the calculations that it needs, returning x_px, y_px, r_px if needed, or returning x_cm, y_cm, z_cm if that's what's needed. Then, if it was the former, the controller can do the triangulation with the collection of x_px, y_px, r_px.

Alternatively, the ServerTrackerView can choose the algorithm if it knows how many cameras can see the object. This creates a weird coupling and order-of-operations issue. First, all the ServerTrackerViews get all the contours. Second, the tracker manager does 3D position estimation for each controller based on how many contours it can find for each controller. Thus we're moving a lot of the logic to the tracker manager. Third, the ServerControllerView simply gets its 3D position from the tracker manager.

Which approach works better?

HipsterSloth commented 8 years ago

I think the first approach (asking the contour-owning tracker to do the calculations it needs) sounds best since that avoids the weird coupling you mentioned. The TrackerView (in my opinion) is all about maintaining blob state and providing helper functions for computing properties about the blobs. The controller has pulls all the tracker and IMU state together do the final fusion step.

To the point about different tracking contours, after we get 1.0 out I'm going to start working on getting DualShock4 tracking functional because we need at work for our game. Sony has decided not to rev their PSMove controller to include a thumb-stick (despite pleas from many devs). This means if you want tracked controllers with thumbsticks we're left with using the DS4, which has a very different blob contour:

ps4controller-5

I think they just use the blob centroid when doing DS4 tracking rather than any kind of fancy curve fitting. Point being, if we make the interface to the tracker support the controller being able to query either the 3d position or the raw centroid we can support the DS4 in the future more easily.

Question about the new cone fitting notes. When we extract the contour and then fit an ellipse to it all we need to maintain at that point is the ellipse centroid and the axis (major and minor) currect? We can throw out the contour at that point right? Also is it the case that the centroid of the fit ellipse the 2D projection of the of the 3D tracked position of the center of the tracking bulb? From the picture you have on the page it looks like the case, but I wanted to make sure I was reading that right.

EDIT: Also if you wanted to try an alternative to OpenCV's ellipseFit method you could try the min volume ellipse fiitting method I wrote for the magnetometer calibration (eigen_alignment_fit_min_volume_ellipsoid). It can be very easily modified to work in 2D. It was based off of this 2D version: http://stackoverflow.com/questions/1768197/bounding-ellipse/1768440#1768440.

cboulay commented 8 years ago

For the DS4, just the centroid would only work if you have 2 cameras. With one camera you might still be able to get something. In the above picture, if you think about it as a really short and wide T, the ends of the longer part of the T, the intersection with the shorter line, and the distal end of the shorter line give you 4 points with known physical spacing. I think you can get a 3D position from that.

Anyway, for the cone fitting, while the least squares solution yields the parameters for the standard form equation of any conic section ax^2 + by^2 + cxy + dx + ey + f = 0 (including ellipses), those don't yield the common ellipse parameters (x0, y0, a', b', phi) without significant effort see equations 19-23. And, those standard ellipse parameters aren't at all necessary for calculating the 3D position using this method.

Also is it the case that the centroid of the fit ellipse the 2D projection of the of the 3D tracked position of the center of the tracking bulb?

No. The centroid of the ellipse is not on the vector from the origin to the center of the sphere. The image in the link (not embedded in the wiki page) shows the xz plane (i.e. overhead view), and the xy plane (camera image plane) is the horizontal dotted line. The ellipse lies on the xy plane that we don't see, but two of the points on the ellipse are A1 and A2 near (but not exactly) the upper-right and lower-left corners of the ellipse.

HipsterSloth commented 8 years ago

No. The centroid of the ellipse is not on the vector from the origin to the center of the sphere. The image in the link (not embedded in the wiki page) shows the xz plane (i.e. overhead view), and the xy plane (camera image plane) is the horizontal dotted line. The ellipse lies on the xy plane that we don't see, but two of the points on the ellipse are A1 and A2 near (but not exactly) the upper-right and lower-left corners of the ellipse.

Ah ok. This is where I was getting hung up. I couldn't find the image you were referring to with: "The image in the link (not embedded in the wiki page)". Perhaps you meant something like this one from stack exchange.

rawdq

When computing the 2D screen position used for the multicam case we should take the 3D position computed from the cone fit, and then use the pinhole camera matrix (computed from multicam calibration) to project the 3d point on the screen. If you get the 3D position exposed to the tracker view I can add 2D projection part (I already have something like that in the multi-cam calibration tool).

For the DS4, just the centroid would only work if you have 2 cameras. With one camera you might still be able to get something. In the above picture, if you think about it as a really short and wide T, the ends of the longer part of the T, the intersection with the shorter line, and the distal end of the shorter line give you 4 points with known physical spacing. I think you can get a 3D position from that.

It was our plan at the office to use two cameras for tracking. I was going to add in a new tracker that uses the ps4eyedriver project. I got some of those nifty custom USB adapters + adapter board you linked me to several months ago. Since the PS4 camera has two cameras in it, I can triangulate using blob centers.

That said, getting DS4 tracking working with one camera using the "Squat T" method you suggested is a good idea. If I can fit the "T" to the blob then I could use cv::SolvePnP to get the controller pose. I'll have to think on that some more.

cboulay commented 8 years ago

The image I was referring to is the one that is linked to immediate below the sub heading on cone fitting on the wiki page. Here is the direct link. (Can't embed until the repo is public)

cboulay commented 8 years ago

Here's the current block of code for DeviceManager::update():

void
DeviceManager::update()
{
    m_controller_manager->poll(); // Update controller counts and poll button/IMU state
    m_tracker_manager->poll(); // Update tracker count and poll video frames

    m_tracker_manager->updateStateAndPredict(); // Get controller colors and update tracking blob positions/predictions
    m_controller_manager->updateStateAndPredict(); // Compute pose/prediction of tracking blob+IMU state

    m_controller_manager->publish(); // publish controller state to any listening clients  (common case)
    m_tracker_manager->publish(); // publish tracker state to any listening clients (probably only used by ConfigTool)
}

This block of code is currently insufficient because the controllers do not have any knowledge of the trackers or vice versa. Each tracker will at least have to know what colours to search for. And, each controller will have to know the list of trackers that might have positional information.

I think this can be combined into one step so there is only a one-way dependency. If we instead pass m_tracker_manager as an argument to m_controller_manager->updateStateAndPredict() (i.e., m_controller_manager->updateStateAndPredict(m_tracker_manager);), then each controller can iterate through the list of trackers and query each for positional information (passing in filter colours), and then combine positional information if it gets more than 1. Then, we wouldn't even need a call to m_tracker_manager->updateStateAndPredict();.

Does that make sense? Am I missing anything?

cboulay commented 8 years ago

1beb6f7149d7b20f634b0b1bfba8143695ca0d07 gives a simple outline of what I'm talking about.

HipsterSloth commented 8 years ago

I think that seems fine. Does that mean that: A) tracker->getPositionForObject(this) does the work of searching for the color blob for the controller? B) tracker->poll() find all the blobs in the frame, recording the blobs it found that frame (not knowing which controller they are for). Then the call to getPositionForObject() will through the blob list for a matching color for the given controller, and compute a position from that?

cboulay commented 8 years ago

A. poll() doesn't know what colours to filter for so it can't find blobs.

cboulay commented 8 years ago

Just one more thought before bed: The object passed to tracker->getPositionForObject(<>) should be compatible with both PSMove controllers and DS4s (and anything else that we might want to optically track). Should that be ServerControllerViewPtr this or IControllerInterface *m_device? Note that this includes Navi controllers even though they can't be optically tracked.

Also, I'm starting to think there's going to have to be a little more back and forth between controller and tracker in terms of what part of the optical tracking happens where. The tracker will find a blob for a colour and ROI seed.

Who finds the controller's position from the blob? It can happen in the controller (good because the algorithm will likely be controller-type specific) or in the tracker (good because other OpenCV code is here).

Then the position might need to be transformed based on the camera's coregistration. This should probably happen in the tracker, but I'll have to look over your triangulation code before I know how to spread that out.

HipsterSloth commented 8 years ago

If the tracker->getPositionForObject() needs to ask any questions about the controller derived from controller state (like orientation or filtered state), the ServerControllerView should be passed in. If it's just using raw controller state, the the IControllerInterface should be passed in. My guess is that we'll want to add getTrackingShape() and getTrackingColor() methods to the IControllerInterface.

Also it occurs to me that if we can't find the controller position maybe we return a bool and make the position an out parameter? I think the back and forth could look something like this:

void ServerControllerView::updateStateAndPredict(TrackerManager* tracker_manager)
{
    int positions_found = 0;
    glm::vec3 position3d_list[k_max_trackers];
    int valid_tracker_ids[k_max_trackers];

    for (int tracker_id = 0; tracker_id < tracker_manager->getMaxDevices(); ++tracker_id)
    {
        ServerTrackerViewPtr tracker = tracker_manager->getTrackerViewPtr(tracker_id);
        if (tracker->getPositionForObject(this->getDevice(), position_list[positions_found])
        {
            valid_tracker_ids[positions_found]= tracker_id;
            ++positions_found;
        }
    }

    if (positions_found > 1)
    {
        glm::vec3 position2d_list[k_max_trackers];
        for (int list_index= 0; list_index < positions_found; ++list_index)
        {
            int tracker_id= valid_tracker_ids[list_index];
            ServerTrackerViewPtr tracker = tracker_manager->getTrackerViewPtr(tracker_id);            

            // Convert the camera intrinsic matric to an opencv matrix
            //const glm::mat4 glmCameraMatrix = tracker->getCameraIntrinsicMatrix();
            //cv::Matx33f cvCameraMatrix = glmMat4ToCvMat33f(glmCameraMatrix);    

            // Get the camera pose 
            //const glm::mat4 glmCameraPose = tracker->getCameraPoseMatrix();

            // Convert to openCV rvec/tvec
            //cv::Mat rvec(3, 1, cv::DataType<double>::type);
            //cv::Mat tvec(3, 1, cv::DataType<double>::type);            

            // Convert 3dpoint to an open cv point
            //cv::vector<cv::Point3f> cvObjectPoints;            

            // Project the 3d point onto the cameras image plane
            //cv::projectPoints(cvObjectPoints, rvec, tvec, cvCameraMatrix, cvDistCoeffs, projectedPoints);            
            // Add the project point back to the 2d point list
        }        
        // Select the best pair of 2d points to use and feed them into
        // cv::triangulatePoints to get a 3d position
    }
    else
    {
        // Use the 3d position for the only valid tracker
    }    
    // IMU
    updateStateAndPredict();
}
cboulay commented 8 years ago

I wish I had found this document a couple weeks ago. Better late than never.

HipsterSloth commented 8 years ago

Oh dang! That's a great reference. It never occured to me you could just divide the conic parameter 'A' through the conic representation and that you were really just dealing with 5 parameters. Which makes sense I guess since you really only need 5 parameters in the parametric case (x/y position, major/minor dimensions and rotation angle).

Quick note on multi cam work. Still heavily under construction, but for reference I just added a stub in the controller state publishing code (in the multicam branch) for sending down raw tracking position per tracker. The camera pose configuration tool needs this data for cv::solvePnP to that I can relate a pixel location to a tracking space location:

https://github.com/cboulay/PSMoveService/blob/multicam/src/psmoveservice/Device/View/ServerControllerView.cpp#L409-L411

This data only needs to be computed when the controller stream has raw_tracker_data flag set (similar to the raw_sensor_data flag). Therefore I'm not sure if it's better to compute this projected position in ServerControllerView::updateStateAndPredict() and then cache it on the controller for for publish() to use or for the projected position to get calculated on the fly in publish(). If we wanted to calculated it on the fly publish() would need to take in TrackerManager similar to updateStateAndPredict(). This doesn't need an answer now, but something to mull on as updateStateAndPredict() gets filled in.

HipsterSloth commented 8 years ago

The framework for this is now in place and running. You can see the position data being feed through here in the tracker calibration tool: https://www.youtube.com/watch?v=FFJqLH_dkRw

I'm going to create separate tasks/bugs for the DS4 tracking and the ellipse fitting issue