valgur / surface-normal

Depth to normals preprocessing tool with Python and CUDA support for CVPR2019 paper 1899: DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image
MIT License
8 stars 3 forks source link

Question about input depth image #5

Open gujiaqivadin opened 4 years ago

gujiaqivadin commented 4 years ago

Hello, valgur! Thanks for sharing your code about computing surface-normal. I have a question about the input depth image. I know it is from KITTI depth completion dataset, but dont know it is a input sparse depth map or dense ground truth depth map. And will the sparisty of depth map effect the quality of surface normal? Also, the second question is that when we are training our model, we will use cropped depth image, can it compute the right surface normal at 256x512 depth image size but not full scale size.

valgur commented 4 years ago

input sparse depth map or dense ground truth depth map. And will the sparsity of depth map effect the quality of surface normal?

Since you probably want to use the normal images as the ground truth for a model, you want them to be as high-quality as possible. In general, the denser, aggregated KITTI depth completion ground truth images will work much better for this. The surface normal estimation algorithm estimates a local plane in an nxn window (the default window size is 15 px), so you will have more points in the window for a more accurate estimate and will also have more than the minimal required 3 points within that window. The only situation where the sparser single-scan depth maps are more accurate seems to be in the presence of dynamic objects, where the noise from the imperfect point cloud aggregation in the denser depth map results in some "wobblyness" in the normals for the dynamic objects. Also, the sparse depth images have not been filtered to exclude points that should be occluded, but overlap with closer ones due to being transformed into the camera frame.

can it compute the right surface normal at 256x512 depth image size but not full scale size.

If you are asking whether the model trained on cropped images can also process full-size images, then yes, the DeepLidar model is a convolutional model and is not limited to a fixed image size.

gujiaqivadin commented 4 years ago

1st question: Thanks for your detailed answer. maybe if I want to supervise surface normal for my output depth map, I will choose to use (sparse+gt) to generate surface normal to get more precise surface normal groundtruth. 2nd question: Yes. I got you about the size of model input. But my question has an another meaning. If I want to supervise depth and surface normal in one pipeline, I need to generate surface normal from a cropped 256x512 image(Because we use this size in depth supervising pipeline). Therefore, I see the there are cx,cy,f arguements in surface-normal function input args. But in a cropped image, these arguements will not have any meaning because we dont know where the cropped area is. So I wonder to know if I need to generate surface normal from a cropped depth image, need I store cropped areas(th,tw) compared to the whole size in model pipeline?

valgur commented 4 years ago
  1. Looking at a concrete sparse vs dense depth input example for surface normal estimation: normals_sparse normals_dense The sparse normals look more accurate to me and have better spatial coverage in some places, so in that sense they might work better as GT. Using just the sparse normals or combining sparse+dense might work quite well, but the overlapping occluded points is still a likely issue that might need to be corrected for.

  2. I agree, trying to predict normals in a cropped image without knowledge of the offset from the camera center and the focal length is rather questionable. The neural net will definitely learn to guess these values to some degree, but either

    • providing the angular offset from the image center in some form or
    • modifying the definition of the normal values might be a better approach, perhaps. By the latter I mean possibly changing the coordinate frame in which the normal direction coordinates are provided in, so that the z-direction of the frame points towards the 3D location of the normal instead of using the same camera frame for all points. The perspective distortion at the image edges might still cause problems with this I approach, though, I guess.
anthcolange commented 4 years ago

Hi, how does one access these dense depth GT images? I've only been able to find the sparse ones shown first in the above post. Thanks!