pangsu0613 / CLOCs

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection
MIT License
345 stars 68 forks source link

Source of 2D Detections provided by CLOCs #47

Open nazMahmoud opened 2 years ago

nazMahmoud commented 2 years ago

Hello @pangsu0613! For reproducing results on second, we need to use 3D detections from a pretrained SECOND model and also 2D detections from Cascaded RCNN. You mention the following with regards to extracting 2D detections:

"_For this example, we use detections with sigmoid scores, you could download the Cascade-RCNN detections for the KITTI train and validations set from here file name:'cascade_rcnn_sigmoiddata'"

My question is related to the model that predicted 'cascade_rcnn_sigmoid_data' 2D detections, is this model trained on training+validation or only on training set ? When I looked at cascaded-rcnn repo, it only provides pretrained models on training and validation set. I do not think they provide models trained only on training set. Could you please clarify this ?

pangsu0613 commented 2 years ago

Hello @nazMahmoud . The model that predicted 'cascade_rcnn_sigmoid_data' is trained on training (3712 examples) set. You are right, cascaded-rcnn repo doesn't provide that, so, I trained it by myself.

nazMahmoud commented 2 years ago

Thank you @pangsu0613 for your support. That makes more sense. Do you have predictions for the other models trained only on training set (RRC and MSCNN) ?

Eaphan commented 2 years ago

Hello @nazMahmoud . The model that predicted 'cascade_rcnn_sigmoid_data' is trained on training (3712 examples) set. You are right, cascaded-rcnn repo doesn't provide that, so, I trained it by myself.

As the model is train on the train set, when predicting the score on train set will be higher than it on val set. Will it affect final detection performance?

pangsu0613 commented 2 years ago

Hello @Viczyf , exactly, you are right. So, idealy, I should have 2 training set, training set 1 for training the 2D and 3D detectors, and training set 2 for CLOCs fusion, this is the best setup. However, KITTI only provides 7481 frames for training, and I already allocate 3712 for training the 3D and 2D detectors. There isn't too many data left (3769) for us to build another training set. We still could, for example take 2000 out of the 3769 left for training CLOCs, and only use 1769 for validation. But I am afraid the validation set would be too small for a comprehensive evaluation. Also, if we want to run inference on the test set, the 3D and 2D detectors require more training data (perhaps the whole 7481). Therefore, in conclusion, back to your question, yes, I think it will affect the final performance, but I don't have that much data for training. But based on what results I get, CLOCs can still improve the 3D/2D performance even with this slightly ill-posed training data setup.

Eaphan commented 2 years ago

Hi @pangsu0613 , Thanks for your reply. And I have another question. Do you have plan to publish codebase to support other 3D detectors (PV-RCNN and CT3D)? Or do you have any suggestions on applying CLOCs to other 3D detectors? I failed improve PV-RCNN with CLOCs. I check the encoding of bounding boxes and use the logits before sigmoid function of both 2D and 3D candidates, as mentioned in https://github.com/pangsu0613/CLOCs/issues/31

Maybe the code should be changed since PV-RCNN/CT3D's cls-branch takes the IoU as targets.