About training config details

huang-yh / SelfOcc

[CVPR 2024] SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

Apache License 2.0

273 stars 17 forks source link

About training config details #2

Closed Doctor-James closed 8 months ago

Doctor-James commented 9 months ago

Thank you for your excellent work. I would like to ask some questions about the training configuration.

Your ray_number is set to [192,400] and num_samples to 256, which is a large value for a 6 camera ring view. Do you use fp16?, what and how many gpus you used, how much gpu memory is used, and what is the training time.

Thanks for your patience

huang-yh commented 9 months ago

Thank you for your interest!

For nuScenes, we use a ray number of [48, 100] and a sample number of 256. This could just fit into a RTX3090 GPU of 24GB memory without fp16. We train on 8 RTX3090 GPUs and it takes less than two days for nuScenes and less than one day for SemanticKITTI and KITTI-2015.

Btw, we have released a new version of our paper on Arxiv with more implementation details and improved results.

Doctor-James commented 9 months ago

Thanks for your reply. I would like to ask if your processing of rgb is exactly the same as sdf. First get the rgb value of voxel-level through a mlp , and then interpolate the rgb value of each sample point. This seems to be different from DVGO and Neus, which interpolates first and then passes through mlp. Can you encode the direction information into the render_rgb model by doing so, since the rgb values of the points seen from different angles should be different?

huang-yh commented 9 months ago

First get the rgb value of voxel-level through a mlp , and then interpolate the rgb value of each sample point.

Yes, we derive rgb values of sampled points by an mlp and then interpolation. This reduces memory consumption considerably when the number of sampled points is orders of magnitude larger than the number of voxels.

We encode the direction information by predicting the coefficients of spherical harmonics following Plenoxel. Yet we do not actually use direction encoding because novel view synthesis is not our focus. The interface in the config files can be found in the following code. https://github.com/huang-yh/SelfOcc/blob/f8c1d0133c4ee4e5ded6d3ac30bbbec9ccceeb6c/config/nuscenes/nuscenes_novel_depth.py#L328