pmj110119 / RenderOcc

[ICRA 2024] RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision. (Early version: UniOcc)
447 stars 25 forks source link

Question about the pretrained checkpoint #36

Closed pkqbajng closed 5 months ago

pkqbajng commented 8 months ago

Hi, thanks for your excellent job. However, I found that you adopted a pretrained checkpoint on the objection detection trained in supervised manner? Is it reasonable for a self-supervised task?

pmj110119 commented 8 months ago

Thanks for attention. Since our baseline(BEVStereo-occ) on nuScenes utilized pre-trained detection weight, RenderOcc adopts the same setup for fair comparison. You may try training from scratch, which could lead to a slight decrease in performance after training for more epochs. You can refer to our experiments as shown in Table 3 of arxiv on nuScenes and Table 2 of arxiv on SemanticKitti, which didn't employ pre-trained weights for 3D detection.

It's worth mentioning that RenderOcc isn't a self-supervised setting but rather avoids the laborious creation of 3D Occupancy Labels. We still utilize easily obtainable labels such as segmentation and depth. For self-supervised occupancy methods, you can refer to recent works like OccNerf and SelfOcc.

rmarcuzzi commented 2 months ago

Hi! I was also trying to reproduce the results of training only with 2d labels from scratch and achieved 19.54 mIoU. Acording to the tables you mentioned from the paper, the result on Nuscenes training from scratch is 20.2 mIoU? and in SemanticKITTI 8.24? Thank you!

pmj110119 commented 2 months ago

Yes, we achieved 20.2 on nuScenes and 8.24 on SemanticKITTI.

The earlier experiment was conducted with a very early version of our model (UniOcc), and there are some untraceable parameter differences between it and the current code. You might need to adjust the learning rate and train for more epochs.