xingyizhou / CenterTrack

Simultaneous object detection and tracking using center points.
MIT License
2.37k stars 526 forks source link

About image augmentation for coco static image #25

Closed liangxiao05 closed 4 years ago

liangxiao05 commented 4 years ago

HI,authors, thanks for your nice work and codes shared.I'm a big fan of your "centernet" architecture. Recently I see your new CenterTrack paper,and find that you use coco static image to simulate tracking frames only by image augmentation and get a large accuracy improvement. I want to try it in normally object detection tasks and see whether it will still work,but when I look forward into the codes,I hav't found the codes referd.Can you give more details about the codes for this, thank you !

liangxiao05 commented 4 years ago

More,I find that the coco_tracking model only 79.5M but have a nice generalization ability for object detection tasks more then some other bigger famous models such as "EfficientDet" when tested in real-world scenes , how do you train this model and do you make some tricks when training ?

xingyizhou commented 4 years ago

Thank you for liking our projects. The code for generating a fake previous frame by augmentation is here.

We are glad to know that CenterNet works better than EfficientDet in your scenarios. One detail that might matter is that in CenterNet we can handle ignored annotation ("iscrowded" labels in COCO) easily, by simply masking out the ground truth heatmap. However I didn't see it is handled in detectron2 and mmdet. It didn't improve COCO AP but is possible to make the model generalize better in real world.

123alaa commented 4 years ago

@xingyizhou about the same issue, I have a question:

  1. I have a dataset that annotated with low FPS, and I don't have the tracking Id for each object, although I want to predict the tracking id from the network, How can I do it?

In the paper you said: "Training on static images. We train a version of our model on static images only, as described in Section 4.4. The results are shown in Table 5 (3rd row, ‘Static image’). As reported in this table, training on static images gives the same performance as training on videos on the MOT dataset. Separately, we observed that training on static images is less effective on nuScenes, where framerate is low."

Is that right? Anyway, can you send in which experiment you did something like this?

liangxiao05 commented 4 years ago

@xingyizhou Thanks for your explanations.

xingyizhou commented 4 years ago

@123alaa The ablation study on nuScenes is not provided in experiments. I used the following:

python main.py tracking,ddd --exp_id nuScenes_3Dtracking_static --dataset nuscenes --pre_hm --load_model ../models/nuScenes_3Ddetection_e140.pth --shift 0.01 --scale 0.05 --lost_disturb 0.4 --fp_disturb 0.1 --hm_disturb 0.05 --batch_size 64 --gpus 0,1,2,3 --lr 2.5e-4 --save_point 60 --max_frame_dist 1.

I haven't tuned augmentation parameters heavily for this experiment. Intuitively, --shift and --scale should be larger, and ideally, match the inter-frame displacement of the dataset.

xingyizhou commented 4 years ago

Closing this for now. Feel free to reopen if you have further questions.