open-mmlab / OpenPCDet

OpenPCDet Toolbox for LiDAR-based 3D Object Detection.
Apache License 2.0
4.63k stars 1.29k forks source link

Reproduce the released results #250

Closed LCJHust closed 4 years ago

LCJHust commented 4 years ago

Hi, I tried to train pointpillar and PV-RCNN with default configure and only modified the number of training epochs, however I can't get the released results. (with 8 RTX 2080Ti). In your released model zoom, pointpillar (moderate categories) -- 77.28 / 52.29 / 62.28, but I my best results is 75.52 / 44.69 / 63.91. For PV-RCNN, your released is 83.61 / 57.90 / 70.47, and my best results is 78.89 / 52.86 / 72.21. Is the gap reasonable? Or can you provide any suggestions to finetune the model then get a better performance? Thank you!

MartinHahner commented 4 years ago

The answer depends on the number of epochs you trained for. If you only lowered the number of epochs by a few, my feeling is that it should not drop that much, but if you only train let's say for 40 to 60 epochs, that could very well be. Especially Pointpillars performance is relatively unstable during training (see below in green vs. PV-RCNN in red). image

davidwang200099 commented 4 years ago

Excuse me, how should I modify the code to reproduce the released results if I only have 2 GPUs at hand? I would appreciate it if anyone can explicitly show me what the modified code should be!

LCJHust commented 4 years ago

The answer depends on the number of epochs you trained for. If you only lowered the number of epochs by a few, my feeling is that it should not drop that much, but if you only train let's say for 40 to 60 epochs, that could very well be. Especially Pointpillars performance is relatively unstable during training (see below in green vs. PV-RCNN in red). image Thank you! Actually, I think I have trained enough. I used default configure settings only changed the number of training epochs. I trained 100 epoches for pointpillars (8 gpus 4 bs / gpu = 32 batchsize),and trained 120 epochs for pvrcnn(6 gpus 2 bs/gpu = 12 batch_size), and the results on kitti val dataset is not so good just like I mentioned above. So I am so confused~

sshaoshuai commented 4 years ago

@LCJHust Have you checked your generated gt_database and infos? If you used old gt_database and kitti_infos (generated by previous PCDet v0.1), you need to re-create these infos and gt_database for current OpenPCDet since the coordinate system is changed in latest version of OpenPCDet.

LCJHust commented 4 years ago

@LCJHust Have you checked your generated gt_database and infos? If you used old gt_database and kitti_infos (generated by previous PCDet v0.1), you need to re-create these infos and gt_database for current OpenPCDet since the coordinate system is changed in latest version of OpenPCDet.

Thank you! I used PCDet which is v0.3. I evaluated your pretrained models on generated gt_database & info , then I can get similar results as you released model zoo. image

So, maybe generated gt_database & info are correct ? I will try to re-create the infos and train again. By the way, now my configure as follows: torch 1.4.0 spconv 1.0 pcdet 0.3.0+96a6092

LCJHust commented 4 years ago

The answer depends on the number of epochs you trained for. If you only lowered the number of epochs by a few, my feeling is that it should not drop that much, but if you only train let's say for 40 to 60 epochs, that could very well be. Especially Pointpillars performance is relatively unstable during training (see below in green vs. PV-RCNN in red). image

Hi, I checked my tensorboard_val during training , and I found that it seems that the curve of R40 is similar with yours. image However I am confused that the results of model zoo your released is AP or AP_R40? Or which evaluation metrics should I compare with your released results? image Thank you! Hope for your reply!

jihanyang commented 4 years ago

All results released in model zoon is AP_R11

LCJHust commented 4 years ago

All results released in model zoon is AP_R11

Thank you! I think so. Then my AP_R40 curve and number of AP_R40 metric is similar with @MartinHahner88 , but there is a large gap of AP_R11 metric. It's so strange o(╥﹏╥)o

LCJHust commented 4 years ago

Excuse me, how should I modify the code to reproduce the released results if I only have 2 GPUs at hand? I would appreciate it if anyone can explicitly show me what the modified code should be! Maybe you can try command like this: CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py --launcher pytorch --cfg_file xxx/xxx/xxx

MartinHahner commented 4 years ago

All results released in model zoon is AP_R11

Maybe it would be better to state the AP_R40 as main metric (and ditch AP_R11 entirely) since literature recommends to use AP_R40.

LCJHust commented 4 years ago

All results released in model zoon is AP_R11

Maybe it would be better to state the AP_R40 as main metric (and ditch AP_R11 entirely) since literature recommends to use AP_R40.

Thank you. Some literature recommended using AP_R40, actually. I know the reason why I cannot reproduce your results. Because I didn't use road plane data during training o(╥﹏╥)o,after I added plane data, I got similar results with yours! Thank you for your guidance! By the way, I have read the codes about road planes, it used to put gt boxes on the road plane. So why it can work?It seems that road plane data is not official data of KITTI, so if you select the road from point clouds and calculate the equation?

zhongzhuonan commented 4 years ago

All results released in model zoon is AP_R11

Maybe it would be better to state the AP_R40 as main metric (and ditch AP_R11 entirely) since literature recommends to use AP_R40.

Hi,The results are the 3D detection performance of moderate difficulty on the val set of KITTI dataset.such as SECOND results image.The AP mean?What does this result represent? when I train the model,I get the car AP and car AP_R40.What does AP_R40 means? What do you mean AP_R11? Thank you!

MartinHahner commented 4 years ago

When I train the model, I get the car AP and car AP_R40. What does AP_R40 mean? What do you mean AP_R11?

Note 2: On 08.10.2019, we have followed the suggestions of the Mapillary team in their paper Disentangling Monocular 3D Object Detection and use 40 recall positions instead of the 11 recall positions proposed in the original Pascal VOC benchmark. This results in a more fair comparison of the results, please check their paper.

From the KITTI Leaderboard: http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d

zhongzhuonan commented 4 years ago

When I train the model, I get the car AP and car AP_R40. What does AP_R40 mean? What do you mean AP_R11?

Note 2: On 08.10.2019, we have followed the suggestions of the Mapillary team in their paper Disentangling Monocular 3D Object Detection and use 40 recall positions instead of the 11 recall positions proposed in the original Pascal VOC benchmark. This results in a more fair comparison of the results, please check their paper.

From the KITTI Leaderboard: http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d

Thank you.

jialeli1 commented 3 years ago

@LCJHust Have you checked your generated gt_database and infos? If you used old gt_database and kitti_infos (generated by previous PCDet v0.1), you need to re-create these infos and gt_database for current OpenPCDet since the coordinate system is changed in latest version of OpenPCDet.

Thank you! I used PCDet which is v0.3. I evaluated your pretrained models on generated gt_database & info , then I can get similar results as you released model zoo. image

So, maybe generated gt_database & info are correct ? I will try to re-create the infos and train again. By the way, now my configure as follows: torch 1.4.0 spconv 1.0 pcdet 0.3.0+96a6092

Hi, I evaluated the released PV_RCNN model, and I got the same result (exactly the same numbers) as yours in the picture. Does this mean that the generated gt_database & info are correct? Did you solve this problem later?

RolandoAvides commented 3 years ago

The answer depends on the number of epochs you trained for. If you only lowered the number of epochs by a few, my feeling is that it should not drop that much, but if you only train let's say for 40 to 60 epochs, that could very well be. Especially Pointpillars performance is relatively unstable during training (see below in green vs. PV-RCNN in red). image

Hi! What did you do to present those plot on the tensorboard? I used the tensorboard plot provided by this repo, but I did not get any plot similar to yours.

Thanks in advance!

MartinHahner commented 3 years ago

I added args.start_epoch = 0 # in order to evaluate all epochs before this line https://github.com/open-mmlab/OpenPCDet/blob/0642cf06d0fd84f50cc4c6c01ea28edbc72ea810/tools/train.py#L188

I think that was all.

LCJHust commented 2 years ago

这是来自QQ邮箱的假期自动回复邮件。您好,我已收到您的邮件,将尽快回复。

Qizhi697 commented 2 years ago

@LCJHust Have you checked your generated gt_database and infos? If you used old gt_database and kitti_infos (generated by previous PCDet v0.1), you need to re-create these infos and gt_database for current OpenPCDet since the coordinate system is changed in latest version of OpenPCDet.

Thank you! I used PCDet which is v0.3. I evaluated your pretrained models on generated gt_database & info , then I can get similar results as you released model zoo. image So, maybe generated gt_database & info are correct ? I will try to re-create the infos and train again. By the way, now my configure as follows: torch 1.4.0 spconv 1.0 pcdet 0.3.0+96a6092

Hi, I evaluated the released PV_RCNN model, and I got the same result (exactly the same numbers) as yours in the picture. Does this mean that the generated gt_database & info are correct? Did you solve this problem later?

Hi, when reproduce PV-RCNN on KITTI dataset, I also get the same results as the coloum "model_zoom_reproduced" in this picture, which seems a difference in class "Pedestrian" and "Cyclist" compared with official "released". So I wonder the reason and what cause the difference?