pangsu0613 / CLOCs

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection
MIT License
351 stars 68 forks source link

Question about fusion of PointRCNN and Cascade R-CNN #31

Closed rkotimi closed 3 years ago

rkotimi commented 3 years ago

Hello @pangsu0613! I know you and your lab mate are working for the fusion of other 3D and 2D detectors, but I am really interested in your work and eager to fuse the candidates of PointRCNN and Cascade R-CNN now. I have got the 3D bounding boxes and 3D confidence scores from PointRCNN, but I can not understand your code for I am not familiar with SECOND. Could you please tell me how to modify your code to train the fusion network of PointRCNN and Cascade R-CNN? I am looking forward to your early reply! Thank you in advance!

rkotimi commented 3 years ago

I tried to modify your code to fuse the candidates, but failed. I replaced final_box_preds and final_scores in function train_stage_2 of voxelnet.py with candidates and confidence scores from PointRCNN and changed the number 70400 to 100(because I fixed the number of my candidates to 100). Then I ran the training code. I found the cls_loss converged quickly which dropped from 4.6547 to 8.7409e-11 after 3700 steps, but the predicted results of fusion_layer was verry bad. The raw scores in cls_preds (_cls_preds,flag = fusion_layer(fusion_input.cuda(),tensorindex.cuda())) are all negative, most of them are about -28. After processing with sigmoid function, they are almost zeroes. I don't know why. Could you please help me solve this problem? If you are busy, could you give me a brief guidance?

pangsu0613 commented 3 years ago

Hello @rkotimi , sorry for the late response. I think you are in the right direction. There are two points that you need to be careful with: (1), Based on my experience, the confidencec scores of PointRCNN are not in a good scale. For example, for all the detections with confidence scores around 0.6, the actual correct ratio (NumberTruePositives/TotalNumber) is not 0.6, this means their scores are biased, for single-modality methods, this is fine, but for fusion, this is not a good thing. The way to solve this is to use the raw score (log score before sigmoid) for fusion. Also, you need to use cascade-rcnn detections also with raw (log) score, you can download this from the same link that I provided in the readme, the file name is "cascade_rcnn_log_data.zip". The point is using raw score from PointRCNN and Cascade RCNN for fusion, then for the fusion output change them into sigmoid for evaluation. (2), PointRCNN has a different way to represent the 7 parameters (x,y,z,h,w,l,yaw) for 3D bounding boxes, you need to confirm the order and meanings of these parameters. The SECOND definition for this part is in /CLOCs/second/pytorch/core/box_torch_ops.py. Let me know if you have further questions.

rkotimi commented 3 years ago

Hello @pangsu0613, thank you for your detailed reply! The two points you mentioned are indeed things that I didn't take into account before. Today I double-check the parameters and rectify some errors. Then I use the log score for fusion. The result is much better. In fact, the 3D candidates I use are from EPNethttps://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600035.pdf, which uses PointRCNN as baseline and fuses image features and point features in the two-stream RPN to improve the performance. Interestingly, the AP can still improve by fusing the candidates. Without CLOCs, the result of car class in KITTI validation set (new 40 recall positions metric) is:

      Car AP@0.70, 0.70, 0.70:
      bbox AP:98.2191, 92.3637, 92.0974
      bev  AP:95.0561, 88.7779, 88.3105
      3d   AP:91.3897, 82.2428, 79.9583
      aos  AP:98.07, 91.94, 91.52 

With CLOCs training for 37120 steps, the result of car class in KITTI validation set (new 40 recall positions metric) is:

      Car AP@0.70, 0.70, 0.70:
      bbox AP:99.3281, 96.0420, 93.1213
      bev  AP:95.7675, 89.8561, 86.7176
      3d   AP:92.3421, 83.3896, 79.8739
      aos  AP:99.16, 95.45, 92.29

But I still have a question. In your paper, the 3D mAP of PointRCNN(baseline) is : 92.54, 82.16, 77.88. The 3D mAP of PointRCNN+C-RCNN is 93.09, 84.09, 80.73, which is much higher than my result. Is that because I still miss some key points? For example, I directly take the output of fusion_layer, namely cls_pred, as the raw confidence score(log score) of my 3D detector for post processing. Is that right?

pangsu0613 commented 3 years ago

Hi @rkotimi , based on the results that you showed, I think you are in the right track. There are some points that I think could improve the performance. (1) I suggest using PointRCNN postprocessing functions (such as NMS and so on) for the filtering of the final results. I guess you are still using SECOND postprocessing functions. The point is that different 3D detectors may have different postprocessing process, after the fused score is generated, going through the original detector's postprocessing is a better option. (2) Currently there are some hard thresholding in CLOCs implementation, and for different detectors setups, these thresholds need to be modified (or tested). For example, in /CLOCs/second/pytorch/models/voxelnet.py, line 398: top_predictions=middle_predictions[np.where(middle_predictions[:,4]>=-100)], this is used to filter out some 2D detections, for different detector setups, this may need change, you could try -0.847 (after sigmoid, this is 0.3), this is because PointRCNN has much less detection candidates (100), feeding to much 2D detections may spoil the results. In /CLOCs/second/utils/eval.py, in function "build_stage2_training (...)", there are two elif conditions "elif k==K-1:", overlaps[ind,0] = -10 and overlaps[ind,2] = -10, these are for the situation that a 3D detection candidate does not have a 2D detection candidate associated, how we fill the 2D related channels, you could try 0 or -100 for here and see how things go.

rkotimi commented 3 years ago

Hi @pangsu0613, thank you for your suggestions! (1) I do not use SECOND's post processing functions. I save the output of fusion_layer, namely cls_pred, to files. Then the files are read in PointRCNN as raw scores(log scores), to go through PointRCNN's post processing. (2) I have tried several parameter combinations you mentioned above. The result is best when I set middle_predictions[:,4]>=-0.847, overlaps[ind,0] = -10, overlaps[ind,2] = -10 in the two elif conditions. The result of car class on KITTI validation set (new 40 recall positions metric) is:

    Car AP@0.70, 0.70, 0.70:
    bbox AP:99.2281, 95.9089, 93.2593
    bev  AP:96.0631, 90.1195, 89.1673
    3d   AP:92.6408, 83.7948, 80.6079
    aos  AP:99.08, 95.43, 92.61

But I guess there is a bit of randomness in the result. Maybe I should try again. By the way, I wonder why the 3D mAP of PointRCNN(baseline) in your paper is much higher than the result in the original paper? Do you use a improved version of PointRCNN?

pangsu0613 commented 3 years ago

Hello @rkotimi, I don't have improved version of PointRCNN, I just used their official released code from here: https://github.com/sshaoshuai/PointRCNN I think the results they showed in the paper are not based on new 40 recall points, KITTI changed this metric (the old one is based on 11 recall points, the numbers are smaller) in the end of 2019. Also, I remebered that PointRCNN made some modifications in the codebase they released after they submitted the paper to CVPR.

rkotimi commented 3 years ago

I got it. Thank you very much! I don't have further questions now.

arsalan311 commented 3 years ago

Hi @FolliesHandle @pangsu0613 , I am following the same procedure for Lidar MTL 3d Detector. I replaced final_box_preds and scores with the scores I got from Lidar MTL. But I am getting this error. are there any config related specifications I might have missed>? error_rewrite(e, 'typing') File "/home/arsalan/.local/lib/python3.6/site-packages/numba/core/dispatcher.py", line 361, in error_rewrite raise e.with_traceback(None) numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) No implementation of function Function() found for signature:

getitem(array(float32, 1d, C), Tuple(int64, Literalint))

There are 22 candidate implementations:

During: typing of intrinsic-call at /home/arsalan/CLOC_Project/CLOC_2/CLOC2/CLOCs/second/utils/eval.py (155)

File "utils/eval.py", line 155: def build_stage2_training(boxes, query_boxes, criterion, scores_3d, scores_2d, dis_to_lidar_3d,overlaps,tensor_index):

                overlaps[ind,0] = iw * ih / ua
                overlaps[ind,1] = scores_3d[n,0]
                ^
pangsu0613 commented 3 years ago

Hello @arsalan311, sorry for the late response. The error you got happens in CLOCs/second/utils/eval.py, function build_stage2_training(...), this is function is run in numba to increase the running speed, so it will not show the specific error you encountered. Therefore, you could comment @numba.jit(nopython=True,parallel=True) for this function, then you will get the specific python error, I guess highly likely the dimension of the input argument is not properly setup.

arsalan311 commented 3 years ago

Okay. Thanks , I had commented numba. But got this error IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed. I guess will need to verify dimentions or the detecton candidates

arsalan311 commented 2 years ago

Hi @pangsu0613 , The training is working perfectly. I need some inputs in the eval. the eval part in the code has predict_v2 which passes the original 3d predictions but I replaced the Final scores and Final preds from other Lidar Detector as mentioned by @rkotimi. Is there anyway i can go around this and just use the indices of the cls_preds from fusion by setting the threshold and call those specific rows?

CrapbagMo commented 2 years ago

Hi, thanks for the good work. I encountered the same problem and i dont know what to do with the "example['anchors']". I tried to replace them with final preds from my 3d detectors but it did not work. Somehow the raw scores are all negative so final results are all zero. @rkotimi @pangsu0613 Could you help me ?

pangsu0613 commented 2 years ago

Hello @CrapbagMo , sorry for the late response, may I know what is the score scale for your 3D detector? Some 3D detectors output sigmoid scores (range between 0~1), and some 3D detectors output raw log scores, I guess for your raw scores are in logarithm scale, if so, you will need to transform them into sigmoid scores using sigmoid function (https://en.wikipedia.org/wiki/Sigmoid_function), then do the CLOCs fusion. Let me know if you have further question.