pangsu0613 / CLOCs

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection
MIT License
351 stars 68 forks source link

Considering 3D detection if there is no corresponding 2D detection #21

Closed vignesh628 closed 3 years ago

vignesh628 commented 3 years ago

Hello @pangsu0613 . In the research paper you have mentioned that you will consider the 3D detection even if there is no corresponding 2D detection. But what happens if you have the 2D detection and there is no corresponding 3D detection.?? image

I have taken this image from your research paper. Where there is 2D detection but no 3D still its showing the output 3D box. Can you provide clarity on this ??

One more query:

  1. If we are taking the input as the velodyne infromation to the 3D detections.We need the pickle files like kitti_infos_train for calibration information ??

  2. Car -1 -1 -10 765.27 107.09 816.20 136.49 -1 -1 -1 -1000 -1000 -1000 -10 0.0000 . Can you please also explain this format output from cascaded RCNN which is present in the pickle file for 2D detections??

  3. Is there any straightforward method where we can directly use the 3D detections with out using seconds so that we can directly feed the 2D and 3D detections to clocs and run ?? Thanks....

pangsu0613 commented 3 years ago

Hello @vignesh628 , if there is 2D detection but no 3D detection, we just ignore them, because the final output is 3D detections, currently, there has been no good techniques to generate accurate 3D detections from 2D detections (it could, but the generated 3D detections contains too much error). Also, based on our statistics (for Cascade RCNN and SECOND), the number of objects missed by 3D detector (SECOND) but detected by 2D detector (Cascade-RCNN) is much much smaller than objects detected by 3D detector but missed by 2D detector (I remembered it is around 50 VS 2000+ for KITTI validation set), so it is fine to ignore them.

pangsu0613 commented 3 years ago

If we are taking the input as the velodyne infromation to the 3D detections.We need the pickle files like kitti_infos_train for calibration information ??

Answer: kitti_infos_train is used for the dataloader and running of SECOND codebase, such as prepare the ground truth labels for training and other stuffs. I have plans to remove these dependencies and make CLOCs decoupled from SECOND.

Car -1 -1 -10 765.27 107.09 816.20 136.49 -1 -1 -1 -1000 -1000 -1000 -10 0.0000 . Can you please also explain this format output from cascaded RCNN which is present in the pickle file for 2D detections??

Answer: this format is the KITTI label format, the detail definition can be found in the readme file in the KITTI object detection development kit, you could download the devkit from here http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d, Click "Download object development kit (1 MB)" in that page.

Is there any straightforward method where we can directly use the 3D detections with out using seconds so that we can directly feed the 2D and 3D detections to clocs and run ??

Answer: I plan to make CLOCs as an open library that does not depend on SECOND codebase and make it easier to use. I'll let you know about the updates in the near future. Thank you for your interests in CLOCs.

vignesh628 commented 3 years ago

@pangsu0613 Thanks for detailed explaination. Does the CLOCS give output only the predicted scores for 3D detections or will it also give the bbox information as well ??. In paper it was mentioned only the predicted scores. Does that mean like for 3D detections from (3D object detection algorithm) we are using clocs for only predicted scores ??

pangsu0613 commented 3 years ago

@vignesh628 , the output of CLOCs are the predicted fused confidence scores for all the 3D detection candidates, no bounding box information. Because we have tried to refine (modify) the 3D bounding boxes as well, but here the new information from image is not very helpful in 3D bounding box regression. Estimating 3D info (3D locations and dimentions...) from images still contains huge amount of error, and the errors are larger than from the 3D detector itself. Therefore, we made the decision to keep the 3D bounding box fixed.

vignesh628 commented 3 years ago

@pangsu0613 THanks for your intrinsic explaination and your patience. I plotted the 2D detections from the cascaded RCNN taken from google drive link you have provided. I see lot of false postives .

image

This is for 006989 image. output from cascaded rcnn has issues or am i missing something ??

pangsu0613 commented 3 years ago

@vignesh628 , the results you plotted here is reasonable because I didn't set score thresholding (in order to have more detection candidates for higher recall), so all the detection candidates are there. Don't worry, most of these false positives have very low confidence score (Most of them are 0.0), so they don't have much impact in the CLOCs fusion.

vignesh628 commented 3 years ago

@pangsu0613 Thanks... But after running inference with clocs for this image iam seeing False Positives image

These 2d detections are plotted based on the pickle file generated in eval_results. I have used seccas_rcnn pretrained model provided in drive. my metrics after running inference are output_metrics_clocs

Any suggestion on how to remove these false positives??

pangsu0613 commented 3 years ago

@vignesh628 , could you double check you are using all the default settings and files that I provided? Have you modify any thing?

vignesh628 commented 3 years ago

@pangsu0613 ,Thanks. Yes there was mismatch in the final fusion weights that were being used. Now its working perfectly fine. 006791

metrics_latest