xinshuoweng / AB3DMOT

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
http://www.xinshuoweng.com/
Other
1.66k stars 403 forks source link

Visualization is better if the data is not classified #83

Closed yanshuaibupt closed 2 years ago

yanshuaibupt commented 2 years ago

First of all, thank you for your excellent work. When I finished reading SORT, I also had the idea of "3D SORT", but when I checked the paper, I found your excellent paper. The problem is, when I complete the matching with the data of pedestrians and cars respectively, after visualization, I find that the two objects are occluded, the ID will switch, and when I throw all the data into the matching, the short-term occlusion will not lead to ID switching phenomenon, in addition, I assign the type of all data to 1, how do I perform evaluation in this case, directly running the original code will get 0 value, but aMOTP is 0.8667.

xinshuoweng commented 2 years ago

Hey, that is a good point! Actually, this is different from what I tried before. When I do matching for all categories together I get worse performance. This is because different object categories have different sizes, motion dynamics, so need to use a different set of matching threshold/metrics to achieve optimal performance.

But, to answer your question, you can simply merge all data into a single folder and then do matching altogether. But you should not assign the type of all data to 1 based on our current evaluation code, which only runs evaluation per category (matching with GT objects having the same category) so the category data for each tracklet is needed. This is why you get bad performance after running an evaluation.

yanshuaibupt commented 2 years ago

I did the evaluation on all kitti training data, and the result is: car: sAMOTA=0.93, pedestrian: sAMOTA=0.74, all: mean MOTA=0.9108 (using motmetrics evaluation on all 21 trainset). I consider that different sizes of objects do not affect the 3D IOU value, because they are in the same 3D coordinate! A person and a car can never overlap theoretically.

xinshuoweng commented 2 years ago

From these numbers, it seems worse than separating evaluation as well, especially for the pedestrian. But I am not sure what evaluation criteria you are using and also the evaluation is done on all 21 rather than only the validation sequence so the results may vary. To be clear, my original point is not that we will get different 3D IoU values when tracking all categories. Instead, the point was that the optimal matching threshold for different categories might be different, which can be found below for example:

https://github.com/xinshuoweng/AB3DMOT/blob/933e4af2ef4c04a7c970f9abf8ae0dc2739ab77e/AB3DMOT_libs/model.py#L53-L57