More Questions about fusion of PointRCNN and Cascade R-CNN

pangsu0613 / CLOCs

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

MIT License

351 stars 68 forks source link

Hello @pangsu0613 ! Four months ago, I fused the candidates of PointRCNN and Cascade R-CNN successfully with the 2D candidates you provided. Now I want to try to train Cascade R-CNN from scratch and produce 2D candidates myself. I have two questions to ask you.

I use the codebase of mmdetection to train Cascade R-CNN. After the refinement of three detection heads, I get 1000 candidates. Originally, the score threshold before NMS is 0.05, the IoU threshold in NMS is 0.5, the score threshold after NMS is 0.3. If I just set the score threshold before NMS to zero and save the candidates immediately after NMS, I may get more than 200 candidates. So I make two attempts. The first is to save the top 100 canidates with the highest scores. The second is to set IoU threshold to 0.2. Both attempts lead to worse fusion results than the result using the candidates you provided. Can you tell me the correct configuration?
I find that the training process of CLOCs is unstable. For example, the evaluation results(AP) of epoch 1 to 5 are 80.9767, 82.8649, 83.3775, 81.5447, 82.4789, respectively. How to make the training process more stable?

I am looking forward to your reply. Thank you in advance!

Hello @rkotimi , thank you for your interests in CLOCs! I am glad that CLOCs works for you. Did you achieve better detection performance through fusing PointRCNN and provided CascadeRCNN detections?

Based on my experience, first I would suggest evaluating the mmdetection version Cascade-RCNN (with nms) on KITTI to see if your training goes well. I have trained some 2D detectors (such as faster-RCNN) using KITTI dataset, and they perform poorly on KITTI (some networks perform well on COCO or other datasets, but bad in KITTI, I guess some tuning of the network may be needed), CLOCs needs decent 2D and 3D detectors. If the 2D detections are in very low quality, then it could spoil the fusion. Another point is the 2D detection "candidates" are not as important as 3D detection "candidates" because the final output is 3D detections, so if this takes too much effort, it is fine to use the final 2D detection results (with nms).
Yes, you are right, I found this issue as well. Currently I don't have any solutions for now... One potential reason is that KITTI is a relatively small dataset, I have tested CLOCs on a larger dataset and it is slightly better.

pangsu0613 / CLOCs

More Questions about fusion of PointRCNN and Cascade R-CNN #46