tryolabs / norfair

Lightweight Python library for adding real-time multi-object tracking to any detector.
https://tryolabs.github.io/norfair/
BSD 3-Clause "New" or "Revised" License
2.41k stars 247 forks source link

After two objects collide, IDs are exchanged. #288

Closed armine105 closed 9 months ago

armine105 commented 10 months ago

Hello How are you? I am using your method to do tracking pool ball tracking. Most of cases, your method works well. Sometimes, but, after two balls collide, the IDs are exchanged.

image image image image

aguscas commented 10 months ago

Hello! This is a common problem in tracking, but I have a few suggestions you can try. The main reason this problem happens is because we estimate the velocities of the objects when tracking them, to predict where they should be in the following frame. The sudden change in velocity during these collisions makes the tracking more challenging.

These are a couple things you might want to try:

  1. Don't use the Kalman filter: When initializing your tracker, you can set the argument filter_factory to NoFilterFactory() to not use the velocity.

    from norfair.filter import NoFilterFactory
    
    tracker = Tracker(
    distance_function=distance_function, 
    detection_threshold=DETECTION_THRESHOLD,  
    distance_threshold=DISTANCE_THRESHOLD,
    hit_counter_max=HIT_COUNTER_MAX,
    initialization_delay=INITIALIZATION_DELAY,
    pointwise_hit_counter_max=POINTWISE_HIT_COUNTER_MAX,
    filter_factory=NoFilterFactory(),    # with this line, you wouldn't be using the velocity
    )

    Remember that doing this might cause other problems when the balls are moving normally, since we just use the last position of the ball (and not its velocity) to estimate where it should be on the following frame. You might want to change your distance_function and DISTANCE_THRESHOLD for that. I don't know which distance function you are currently using, but for example, I would suggest using the euclidean distance between the centers of the balls instead of IoU, since the boxes of the predicted position and the detected position might not even overlap if a ball moves too fast. If you need any help with this, please let me know.

  2. You can try using a distance function that considers an embedding of how the object looks like. So in a similar fashion (but not exactly the same) to what it was done in the Reid demo, you can compare how similar two objects look instead of just their position on the screen.

So, for example, for each detection you could crop the bounding box and compute its color histogram with opencv and normalize it.

def get_hist(image):
    hist = cv2.calcHist(
        [cv2.cvtColor(image, cv2.COLOR_BGR2Lab)],
        [0, 1],
        None,
        [128, 128],
        [0, 256, 0, 256],
    )
    return cv2.normalize(hist, hist).flatten()

and put that in the embedding attribute of the detection

for detection in detections:
  cut = get_cutout(detection.points, frame) # get_cutout would be a function that crops the bounding box
  if cut.shape[0] > 0 and cut.shape[1] > 0:
      detection.embedding = get_hist(cut)
  else:
      detection.embedding = None

Then you can define a custom distance_function that compares the embedding of the detection in the current frame, with the embedding of the last_detection associated to a tracked object (or with a subset of past_detections). You can even combine both the spatial distance and the similarity of the embeddings, either by simply multiplying or adding both values, or for example doing conditionals.

For example, compare the embeddings, and if the objects look similar, then you compute the spatial distance and use that as the final distance, otherwise if the objects don't look similar then set that final distance to a really large number so that they won't ever match. Or you can do it the other way around, if the objects are close (spatially) to each other, then compute the similarity of the embeddings and set that as the final distance, otherwise if they are really far from each other then set the final distance to a super large value without even looking at the embeddings.

The following is an example just using the embedding similarity with the most recent past_detection:

def your_custom_distance_function(detection: "Detection", tracked_object: "TrackedObject") -> float:
  for past_detection in reversed(tracked_object.past_detections):
      if past_detection.embedding is not None:
          last_embedding = past_detection.embedding
          break
  else:
      return 1

  # this should give a distance between 0 and 1, since we are just doing 1-correlation 
  distance = 1 - cv2.compareHist(
      last_embedding, detection.embedding, cv2.HISTCMP_CORREL
  ) 
  return distance

Please note that I haven't tested any code snippet here, so there might be some problems to address when trying to run them. If you need any further assistance, please let me know.

rose-jinyang commented 10 months ago

Thanks for your full guidance.

armine105 commented 10 months ago

Hi @aguscas Thank you very much I fixed the above issue by revising the mean_eculidean function as the following.

image

Most of cases, it works well. That is, when associating a detection and an already existing tracked object, I used the tracked object's last detection position rather than the estimated next position.

aguscas commented 9 months ago

Nice! That was a great idea also!