Inquiry on Applying IDA to Generated Rays in NeRF-based Models

pmj110119 / RenderOcc

[ICRA 2024] RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision. (Former version: UniOcc)

422 stars 23 forks source link

Inquiry on Applying IDA to Generated Rays in NeRF-based Models #38

Open SPA-junghokim opened 5 months ago

SPA-junghokim commented 5 months ago

I hope you're doing well. I'm writing to express my appreciation for your research on NeRF and its rendering techniques, particularly the application to OCC models. Your work has inspired my project, which explores integrating camera characteristics more effectively into model structures.

I have a specific question regarding the ray generation process when applying IDA. Should post-rotation and post-translation adjustments be applied to the generated rays after projection onto images? Understanding the implications of IDA on ray adjustments would greatly benefit my research.

I look forward to your insights and thank you in advance for your guidance.

Best Jungho Kim

pmj110119 commented 5 months ago

Hi Kim, I believe it's unnecessary.

Rays are in the world coordinate system, not the camera coordinate system, so the IDA transformation applied at the image level doesn't affect rays. IDA does impact the process of converting 2D image features to 3D volume features, requiring careful handling, but it doesn't affect rays. For a ray generated from the original image, you simply need to directly retrieve color, class, depth, and other pixel label values from the corresponding pixel. This process doesn't involve coordinate transformations.

SPA-junghokim commented 5 months ago

I really appreciate your reply. I understood your word.

Regards, Jungho Kim

SPA-junghokim commented 5 months ago

I have another question: In the nerf_head, using the rendering loss versus not using it results in about a 7-fold difference in training time. Is it normal for it to take significantly longer?

pmj110119 commented 5 months ago

7x increase in time seems somewhat abnormal. Could you provide the time cost of one iteration, and is it similar to that observed in the example log?

SPA-junghokim commented 5 months ago

It has been confirmed to be a hardware issue with our own server. Thank you.

I have one more question: Do we have to use separate heads for density and semantic, or can we treat the output from a single head, like in BEVDET-Occ, as density volume for areas not predicted as empty-class and render them?

pmj110119 commented 5 months ago

You can of course use output predicted by single head and then separate channels to get density and semantic logits.

However, in our early experiments, the performance of using two MLPs to predict geometry (density) and semantics respectively was slightly better, but this has not been rigorously verified.