How to use the depth map and camera intrinsics to make a 3D world frame, by which vista can use the relative transformation to count next observation

Hi Amini, thanks for sharing the surprising and very useful project. I am very intresting in the data-driven simulation of RGB camera. So I read your paper 'Learning Robust Control Policies for End-to-End Autonomous Driving from Data-Driven Simulation'. This part make me confused 'From the single closest monocular image, a depth map is estimated using a convolutional neural network using self-supervision of stereo cameras [28]. Using the estimated depth map and camera intrinsics, our algorithm projects from the sensor frame into the 3D world frame' Could you please give me more details about how to use the depth map and camera intrinsics to make a 3D world frame and use the relative transformation to count next observation. Best Regards

vista-simulator / vista

How to use the depth map and camera intrinsics to make a 3D world frame, by which vista can use the relative transformation to count next observation #10