wzzheng / TPVFormer

[CVPR 2023] An academic alternative to Tesla's occupancy network for autonomous driving.
https://wzzheng.net/TPVFormer/
Apache License 2.0
1.19k stars 107 forks source link

请问occupancy有大model的config么? #17

Open WuDianQiBian opened 1 year ago

WuDianQiBian commented 1 year ago

作者您好,首先非常感谢做出这样优秀的作品。 paper中提到训练occupancy时的TPV resolution是200x200x16,并且dim是128,然而在tpv04_occupancy.py中,TPV resolution是100x100x8, dim是256: image

请问可以重新上传一个config么?最好和paper保持一致,这样方便大家复现。 万分感谢~!

(顺便说一下,新上传的可视化代码在visualization文件夹中,但是有一些package import用的是visualize这个词,一个minor bug,请知晓)

huang-yh commented 1 year ago

Sorry for the confusion. We actually use [100*100*8 tpv resolution, 256 feature dimension] for 3D semantic occupancy prediction and [200*200*16 tpv resolution, 128 feature dimension] for lidar segmentation in the paper. Note that there is no necessary connection between the resolution of the tpv planes and the voxel resolution for visualization, since we can upsample the tpvplanes as shown in Fig. 6 at test time. On the other hand, finer details could be expected if tpv planes of higher resolution are used. Also, thanks for reporting the bug to us.

WuDianQiBian commented 1 year ago

I see. Thanks for the reply. I can understand we can upsample the tpvplanes during test time. But is there any reason not to use 200x200x16 for training in your paper? Did you observe minor improvements when increasing from 100x100x8 (with 2x upsample at test time) to 200x200x16 (no upsample at time time)?

huang-yh commented 1 year ago

In fact, we did not notice substantial improvement qualitatively through visualization, when training with a resolution of 200x200x16. We think it might be due to the sparse nature of LiDAR supervision, which is further sparsified with higher resolution.