Tracking on large displacements

Hi, first thanks for this library I think it's top tier when it comes to tracking algorithms.

I'm currently struggling to make it work on big movements or large displacements between frames. Unlike most cases, the tracking I require is not just a few pixels of movement but maybe the objects are moving by 170 pixels in the Y axis (up). My image is 1280x720 pixels so it's quite a percentage of the image.

The first problem was that the Kalman filter could not keep up, but I manage to make it work more or less. Now I'm trying to add the optical flow information into the estimations.

The problem I'm facing now is that the integrated optical flow methods for the class "MotionEstimator" and HomographyTransformationsGetter do not work properly because their hyperparameters are configured for a normal scenario, but in my case, I will need further flexibility.

I think the problem lies within the method "_get_sparse_flow()" from the camera_motion.py. That method uses "cv2.calcOpticalFlowPyrLK" to match the points from the previous image and new image but it is using the default parameters and I did not find a way to change them. This values are winsize of 21 and max_level of 3. From my understanding that is making a three level pyramid image so the 21 pixels winsize will be like 84 in the last image, 4 times bigger since the image is 4 times smaller.

This is where the problem is since my object has moved 170 pixels, thus the majority of featuresToTrack but the area to search is 84 pixels. That means that the cv2.calcOpticalFlowPyrLK is going to match incorrect points, providing a bad homography or translation.

First, I would like to know if this makes sense or if I'm making a mistake. Second, if there is something I can do to handle big movements with Norfair besides using the MotionEstimator. Third, if my assumptions were correct, could we modify the parameters of the calcOpticalFlowPyrLK in the MotionEstimator class as we can do currently with the cv2.goodFeaturesToTrack

Hi @omar-ogm, thank you for the kind comments on Norfair!

First of all, let me answer your question about fast-moving objects, Norfair should be able to handle them as long as the trajectory is not too erratic. The most important thing here is to choose a good distance_function and a distance_threshold. What are you using for it?

For context, if the objects move too much between frames iou will always give you the maximum distance of 1 which doesn't rank the closest objects properly. In this case, you might need to look into one of the norm-like distances. normalized_mean_euclidean for instance is a good starting point since it normalizes the distance by the frame size (distance of 1 means opposite corners of the frame) but frobenius, mean_euclidean, and mean_manhattan should also work correctly.

Regarding the MotionEstimator, this functionality is meant to estimate the movement of the camera not the movement of objects. If your camera is static, it won't help you.

The motivation is that when the camera moves, the apparent trajectory of the objects becomes too erratic which confuses the tracker. By estimating the movement of the camera we can compensate for it and "stabilize" the objects so that they are easier to track.

Hope this helps, if you are still having issues share with us more details on the parameters and if possible a sample video so that we can analyze it further, complex scenarios are always welcome as an opportunity to improve the library.

Hi @javiber, thanks for your quick response.

First of all, I will provide some images as an example, these are three consecutive images. The images come from a camera mounted on a vehicle on the tracks, the objects appear on the bottom on the image and travel upwards until the finally dissapear:

First of all, let me answer your question about fast-moving objects, Norfair should be able to handle them as long as the trajectory is not too erratic. The most important thing here is to choose a good distance_function and a distance_threshold. What are you using for it?

The trajectory is not erratic although it is non linear, in fact it would be similar to a logarithm so this may be affecting the Kalman filter. Anyways this is not really a problem. An object may appear only 5 times from the bottom to the top.

I'm currently using the center points of the bounding box of my detections, the center points for the matching. Then I'm using the euclidean_distance (norm 2), not normalized though.

The problem is that the Kalman filter doesn't have time to converge since we are only going to see 5 times the object. So I had to increase the OptimizedKalmanFilterFactory(vel_variance=10)otherwise it won't keep up with the movement. Also, I had to reduce the initialization_delay = 1 so it could start predicting the new pose of the tracked object since the beggining.

The problem is obvious, we can see an error in the second image that tracker_objects 6 and 7 should have been the upper detections and not the ones that have just appeared. Since the new objects that appear in the scene are very close in position to the previous position of the objects in the first image, the objects are incorrectly matched. If I use the euclidean_distance or any distance-based algorithm (pixel distance) this is always going to be a problem unless I include optical-flow information that helps to understand the scene movement and the tracker_objects predicted position is based on that information.

Regarding the MotionEstimator, this functionality is meant to estimate the movement of the camera not the movement of objects. If your camera is static, it won't help you.

As you can see in the example, it's a railway, so the movement comes from the camera (attached to a vehicle) and the objects are not really moving, that's why I expected the MotionEstimator to help. Now I hope my first comment on this thread makes more sense. The thing is that the results from the sparse optical flow in the MotionEstimator are really bad if I draw it. An example of the MotionEstimator used motion_estimator = MotionEstimator(max_points=200, min_distance=100, block_size=9, transformations_getter=HomographyTransformationGetter(), draw_flow=True, quality_level=0.01), parameters are just some random values, I've tried with different values like increasing min_distance and the block_size, but the results are poor, the matching between points in the MotionEstimator is not good so the optical flow estimation doesn't help. I was trying to tweak the MotionEstimator to adjust it to my problem, since that way I could probably predict a better movement in the first place and so the objects will likely be matched better even using a simple euclidean_distance.

Another option is to use embeddings as the distance_function but I would like to avoid this as it requires more finetuning.

I hope this makes everything more clear.

Finally as a bonus this is the MotionEstimation Im getting

As you can see it's a bit messy, my guess is that the parameters for MotionEstimation are not adequate to this movement, requiring finetuning

@omar-ogm sorry for the late response, was caught up with other problems.

First let me tell you that this is a very interesting use-case, an interesting application of camera motion that we didn't think about went developing it.

Regarding the problem with tracking, as you mention, the source of it is that we initialize all objects with 0 velocity and since you had to lower the delay, the Kalman filter doesn't have time to keep up with the motion.

Using the MotionEstimator should solve this issue since your objects will indeed have 0 velocities if we correct for the movement of the train. So we just need to make the Estimator work. In that regard, I have had problems when there are repeating patterns in the video but mostly solved it by tweaking the min_distance and the max_points. Your video seems to be moving really fast so that might be causing extra problems as you mentioned. To address this I just uploaded a branch that exposes the parameters you mentioned, you can install it with pip install git+https://github.com/tryolabs/norfair.git@customizable-optical-flow#egg=norfair, hope that helps.

In addition, we will like to test a bit more, we looked for similar videos and found this one but the camera motion works correctly in it so there must be something else on your videos that's causing the issues, if you can share a snippet we would love to test a bit more and try to improve the camera motion estimation.

Regarding using embeddings on the distance, I would avoid that since your objects look quite similar it will be really hard to get good embeddings.

Hi @javiber, thanks for your response!

Using the MotionEstimator should solve this issue since your objects will indeed have 0 velocities if we correct for the movement of the train. So we just need to make the Estimator work.

That's what I thought, but unfortunately, I haven't been able to make it work yet. I tried specifying directly the parameters by modifying the library but I could not get a good estimation. Norfair uses the following method to estimate the optical flow:

#
# Motion estimation
#
def _get_sparse_flow(
    gray_next,
    gray_prvs,
    prev_pts=None,
    max_points=300,
    min_distance=15,
    block_size=3,
    mask=None,
    quality_level=0.01,
):
    if prev_pts is None:
        # get points
        prev_pts = cv2.goodFeaturesToTrack(
            gray_prvs,
            maxCorners=max_points,
            qualityLevel=quality_level,
            minDistance=min_distance,
            blockSize=block_size,
            mask=mask,
        )

    # compute optical flow
    curr_pts, status, err = cv2.calcOpticalFlowPyrLK(
        gray_prvs, gray_next, prev_pts, None, winSize=(21,21), maxLevel=3
    )
    # filter valid points
    idx = np.where(status == 1)[0]
    prev_pts = prev_pts[idx].reshape((-1, 2))
    curr_pts = curr_pts[idx].reshape((-1, 2))
    return curr_pts, prev_pts

I tried modifying the maxLevel and winsize (although I must say I'm not an expert in the calcOpticalFlowPyrLK) but without luck. Maybe the patterns are too similar and the method is not able to match the key point from goodFeaturesToTrack properly. It could be a limitation of the algorithm that has shown its weakness on my images.

I tried using ORB feature matching and it seems to work much better (see the next image) maybe it would be good to include a more robust feature matcher as an alternative in Norfair like ORB:

Here is the code I'm using to get the previous image plus a displacement (similar to what the TranslationTransformation(movement_vector) requires.

class ORBTracker:
    """
    Tracker to track the movement (Optical Flow) between two images using ORB matching. Instead of using other optical flows methods
    ORB has proven to be more robust and the algorithm requires no finetuning of parameters unlike other optical flow methods.
    """
    def __init__(self):
        """

        Args:

        """
        self.previous_frame = None
        self.kp = None
        self.des = None

        # Compute KP for the first time
        self.orb = cv2.ORB_create()
        self.matcher = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)

    def update(self, frame, debug=False):
        """
        Returns the median displacement of the image
        Args:
            frame:

        Returns: success, displacement (x,y) being the (0,0) the top left corner of the image
        """
        displacement = [0, 0]

        if self.previous_frame is None:
            self.previous_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
            self.kp, self.des = self.orb.detectAndCompute(self.previous_frame, None)
            # Return empty movement since its the firts time we update it.
            return True, displacement

        start_time = time.time_ns()

        new_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        new_kp, new_des = self.orb.detectAndCompute(new_frame, None)

        if self.des is None or new_des is None:
            success = False
            logging.debug("No descriptors could be found on image, no movement will be predicted")
        else:
            success = True
            matches = self.matcher.match(self.des, new_des)
            # Get only the 50 that have a lower distance among the max 500. This could be improve (robustness) with "ratio test" and speed with flann based matcher by checking:
            # https://docs.opencv.org/master/dc/dc3/tutorial_py_matcher.html
            matches_ord = sorted(matches, key=lambda x: x.distance)[:50]

            displacement = self.__compute_median_displacement(kp_old=self.kp,
                                                              kp_new=new_kp,
                                                              matches=matches_ord).astype(np.int)

            print(f"ORB_TIME: {(time.time_ns() - start_time)/10e6} ms")

            if debug:
                matched_image = cv2.drawMatches(self.previous_frame, self.kp, new_frame, new_kp, matches_ord, None, flags=2)
                cv2.imshow("Mathes", matched_image)
                cv2.waitKey()

            # Update previous with new data
            self.previous_frame = new_frame
            self.kp = new_kp
            self.des =  new_des

        return success, displacement

    def __get_points_from_matches(self, kp_old, kp_new, matches):
        # pts_old = [kp_old[match.queryIdx].pt for match in matches]
        # pts_new = [kp_new[match.trainIdx].pt for match in matches]

        pts_old = []
        pts_new = []
        for match in matches:
            pts_old.append(kp_old[match.queryIdx].pt)
            pts_new.append(kp_new[match.trainIdx].pt)
        return pts_old, pts_new

    def __compute_median_displacement(self, kp_old, kp_new, matches):
        """
        Computes the median values for the displacement of the image in the X and Y axis.
        Args:
            kp_old: kepoints from previous frame
            kp_new: keypoint from the new frame
            matches: matches between thos kps
        Returns: the median displacement in (x, Y) as a np.array 1x2
        """

        pts_old, pts_new = self.__get_points_from_matches(kp_old=kp_old, kp_new=kp_new, matches=matches)

        median_displacement = np.median(np.asarray(pts_new) - np.asarray(pts_old), axis=0)
        return median_displacement

In fact, I've been getting the median displacement for all those points and passing that as the movement_vector. Instead of using the estimated motion from _get_sparse_flow(). With this, I've solved the problem that I had since the movement_vector estimated is quite accurate.

If you feel like integrating this way of matching points, as an alternative to the get_sparse_flow I could try to create a PR myself to contribute if you want.

One last thing that I do not understand yet is the update on detections and tracked_objects. I would expect the tracked_objects to update their position with the estimated motion (OpticalFlow based) BUT this doesn't seem to be the case. On the other hand this update is being made in the Detection and I can access it using the detection.absolute_points instead of detection.points.

Example of Detection Motion Update https://github.com/tryolabs/norfair/blob/3775b41e54b33d1f06ef869a35d3a84f1f3fefa9/norfair/tracker.py#L766 It's being updated as expected. One can access the "previous estimate position" of the detection with detection.absolute_point

Tracker Motion Update https://github.com/tryolabs/norfair/blob/3775b41e54b33d1f06ef869a35d3a84f1f3fefa9/norfair/tracker.py#L713 As you can see nothing is happening there, I would expect the position estimation to happen there.

I can see that the update is happening on https://github.com/tryolabs/norfair/blob/3775b41e54b33d1f06ef869a35d3a84f1f3fefa9/norfair/tracker.py#L569 but when I access in the distance_function to the trackers estimate with return np.linalg.norm(detection.absolute_points - tracked_object.estimate) the estimate seems to not have included the optical flow update...

So, since I don't know how to access the tracked_object.estimate (with the optical flow applied) I access the previous estimated position of the Detection with absolute_points. This is a workaround I found but I'm sure you could point me in the right direction.

In addition, we will like to test a bit more, we looked for similar videos and found this one but the camera motion works correctly in it so there must be something else on your videos that's causing the issues, if you can share a snippet we would love to test a bit more and try to improve the camera motion estimation.

Example Images I don't have a video but I can share a set of images so you can try the MotionEstimator on this set of consecutive images. I must say that I've been trying to make this work with not all the images but only 1 out of 2 to simulate half of the FPS and a higher displacement between images. If you used all images the Kalman filter along it's enough to handle the tracking, but using only even/odd images it's more challenging. (I've been personally using the even images) In any case, the optical flow from the method _get_sparse_flow() does not seem to work for this kind of images in any case.

Link available for 1 week only! https://we.tl/t-upSBWDBFC1

That's interesting, I'll need to read more about ORB so that we can discuss how (and if we want) to integrate. Will add it to our backlog.

I tested the camera motion on your video (thanks for sharing) and it looks good even with default parameters.

I usually debug the camera motion using draw_absolute_grid In general, the optical flow is a bit messy, but the MotionEstimator is robust to some of the noise. I made a video with the results here the grid looks a bit funny because of how the camera is moving but the important part is that the points should move alongside the image so focus on a section and the points should "stick" to the background and in general, they do, there's a bit of drift but nothing crazy.

To pass the information of the camera movement to the TrackedObjects you must do it through the Tracker's update method like in this example.

This makes me think that the docs on camera motion should be improved with tips on how to use and debug it, another thing to add to our backlog

Closing due to inactivity, feel free to re-open

tryolabs / norfair

Tracking on large displacements #218