Closed pkqbajng closed 5 months ago
Thanks for attention. Since our baseline(BEVStereo-occ) on nuScenes utilized pre-trained detection weight, RenderOcc adopts the same setup for fair comparison. You may try training from scratch, which could lead to a slight decrease in performance after training for more epochs. You can refer to our experiments as shown in Table 3 of arxiv on nuScenes and Table 2 of arxiv on SemanticKitti, which didn't employ pre-trained weights for 3D detection.
It's worth mentioning that RenderOcc isn't a self-supervised setting but rather avoids the laborious creation of 3D Occupancy Labels. We still utilize easily obtainable labels such as segmentation and depth. For self-supervised occupancy methods, you can refer to recent works like OccNerf and SelfOcc.
Hi! I was also trying to reproduce the results of training only with 2d labels from scratch and achieved 19.54 mIoU. Acording to the tables you mentioned from the paper, the result on Nuscenes training from scratch is 20.2 mIoU? and in SemanticKITTI 8.24? Thank you!
Yes, we achieved 20.2 on nuScenes and 8.24 on SemanticKITTI.
The earlier experiment was conducted with a very early version of our model (UniOcc), and there are some untraceable parameter differences between it and the current code. You might need to adjust the learning rate and train for more epochs.
Hi, thanks for your excellent job. However, I found that you adopted a pretrained checkpoint on the objection detection trained in supervised manner? Is it reasonable for a self-supervised task?