How to perform OCC prediction using only the front camera?

weiyithu / SurroundOcc

[ICCV 2023] SurroundOcc: Multi-camera 3D Occupancy Prediction for Autonomous Driving

Apache License 2.0

787 stars 97 forks source link

How to perform OCC prediction using only the front camera? #90

Open AlphaPlusTT opened 9 months ago

AlphaPlusTT commented 9 months ago

I want to test the inference using only the front camera's images. Therefore, during inference, I set the three channel values of images from the other five cameras to the mean specified in the 'img_norm_cfg' set in 'surroundocc_inference.py.' This ensures that, during inference, after data preprocessing, the images from the other five directions, except the front image, are set to 0. However, even after making this adjustment, the inference results still show predictions for OCC not only in the front camera but also in many other directions. What could be the reason for this?

weiyithu commented 9 months ago

Since the network is supervised by the 3D occupancy groundtruth, which covers all camera views, the network may guess the invisible regions. Also, setting other five cameras images to the mean value may be not true. The bias in BN layers lead to non-zeros images for these five views. I think you should follow the way used in MonoScene. In other words, only using the front-view occupancy groundtruth and retrain the model with only front view as input.

AlphaPlusTT commented 9 months ago

@weiyithu Thank you for your detailed explanation. My dataset includes four cameras, but it's not a complete 360° setup. Currently, I've only calibrated the front camera and would like to initially test the model using it. I'm curious about whether the number of cameras during inference needs to match the training setup or if similar camera installation angles are necessary. Have you explored this aspect in your experiments? Any insights or assistance you could offer on this would be greatly appreciated.

weiyithu commented 9 months ago

I think you can change the number of images in a batch from 6 to 4 instead of using zero images. However, I cannot guarantee the accuracy since yor four images do not cover 360° and our model is trained with 6 surrounding 360° images.

AlphaPlusTT commented 9 months ago

Thank you for your patient answer. Would you mind sharing more information about the angles between cameras and the installation positions of cameras in the in-the-wild data mentioned in README? If there are pictures to illustrate, that would be great. Thank you very much.