Question on the data used for this work

I am not an expert on 3D scene understanding. Sorry if my question is too low ... But I have a question on the data used for this work. My understanding on RGB-D images is they are composed of two parts, one is the normal 2D RGB image and the other is the depth image which is also 2D actually. How to get the 3D image from them and use the 3D image as the input of the neural network? And if there are occlusions in 2D images, are there occlusions in the obtained 3D images?

shurans / DeepSlidingShape

Question on the data used for this work #29