Closed Nathan-Li123 closed 2 weeks ago
Thanks for the question. Yes, classification results are usually used as a heuristic in classic MOT methods for datasets such as MOT, KITTI, and nuScenes. The heuristic is to only associate objects with the same class predictions. However, we found this heuristic does not work well since the classification prediction on videos is very inconsistent among frames for large open vocabulary. Thus, we do not use any classification results for matching. The final category is decided by a simple majority vote.
Thanks
Your work is excellent and very inspiring! I have a question: when the model performs the matching operation, it seems that the classification results are not considered. In other words, a single trajectory might include detections from different categories. So how is the final category of that trajectory determined? Is it decided through majority vote?