huang-yh / SelfOcc

[CVPR 2024] SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
Apache License 2.0
273 stars 17 forks source link

Inquiry on code releasing for compared methods #13

Open Letian-Wang opened 5 months ago

Letian-Wang commented 5 months ago

Hi SelfOcc authors,

This work is so awesome! Really enjoyed it!

I'm wondering, do you plan to release the reproduced code for compared methods? For example, will you release the reproduced implementation of Scenerf on nuscenes? Those will be very helpful to further follow your works. Thanks!

huang-yh commented 3 months ago

Sorry, but we are not planning to release the code for methods that we compare with, since they are totally based on the original repos which have no relation with our own repo. However, we do briefly describe the adaptation strategies in our paper.

Letian-Wang commented 3 months ago

Thanks a lot for the response!

Another question I wanna ask is that, I observe that SelfOcc is able to generate very dense occupancy predictions, e.g. almost the whole ground is predicted to be occupied. I'm wondering what is the key to such dense prediction since many methods usually predict sparse occupancies especially in far-away ground. For example, in Fig 4, the prediction seems to start from full occupancy, and gradually removes unnecessary occupancy, but the ground is still predicted to be fully occupied at the end, even for the invisible region (ground behind the trees on the left/right side). It would be very helpful if you could give more insights on this. Thanks!

huang-yh commented 3 months ago

We think the key to the dense prediction in SelfOcc is the strategy used to sample camera rays for supervision. About that, we sample almost random rays from random temporally adjacent frames, which results in a comprehensive supervision on sdf/density values in the whole space. Note that it is inherently different from the supervised methods which could be more or less affected by the pattern of annotations. As for the prediction for ground in invisible regions, we would attribute it to the strong but also easy prior learned by the model that the ground is flat and very likely to spread across the 3D space, although there is no guarantee for its correctness.

Letian-Wang commented 3 months ago

Thanks so much for the insightful and quick response!!

It looks to me that SelfOcc is inherently different from supervised methods: SelfOcc starts from full occupancy and removes occupancy when there are clues from the images/depth; while typical methods (e.g. LSS methods) start from empty occupancy and create occupancy when there are clues. Is this understanding correct?

Another question to bother you regarding the visualization, how did you generate the demo video across both sweep/sample frames? Looking through the code, I guess info['prev/sample'] refers to the sample frame while info['prev_sample/next_sample'] refers to sweep frames right? And since the data creation script seems not available, I was also wondering how you solve the problem of variable number of sweep frames between sample frames, and how to synchronize sweep cameras and lidars. Thanks!