yzcjtr / GeoNet

Code for GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose (CVPR 2018)
MIT License
726 stars 181 forks source link

How to get the real-world depth image #7

Closed xielinjiang closed 6 years ago

xielinjiang commented 6 years ago

HI: Thank you for your work! I read your code, but I didn't find out how to get a real-world depth image. Looking forward to your reply. Thank you!

kristijanbartol commented 6 years ago

If you are talking about using KITTI dataset, try eval_depth.py on KITTI test data, but change the code so that it saves ground truth disparities to images.

xielinjiang commented 6 years ago

Thanks for your answer, how can I change the code?

xielinjiang commented 6 years ago

Sorry, I mean is that generating depth map from pred_depth. Simply, input an image and how to output the depth of the image(KITTI dataset).

yzcjtr commented 6 years ago

The multi-scale prediction is exactly self.pred_depth. It differs from the real-world depth (in real-world metric) by a scale factor which is unknown. See here for reference.

xielinjiang commented 6 years ago

from the code: scalor = np.median(gt_depth[mask])/np.median(pred_depth[mask]) pred_depth[mask] *= scalor if we want to calculate the scalor, we must get the gt_depth, but if there is a unknown image without gt_depth, how do I get the depth of the unknown image?

yzcjtr commented 6 years ago

The scale ambiguity can not be resolved unless you have prior knowledge, which is the case in most monocular SLAM methods.

xielinjiang commented 6 years ago

Does that mean that in your paper, the depth shown in the image is not the true depth? 1527555241 1

yzcjtr commented 6 years ago

They are indeed the depth prediction by our method. They differ from the true depth by a relative scale. If you perform normalization operation, i.e. depth/np.mean(depth) for groundtruth or predicted depth, they should be similar.

kristijanbartol commented 6 years ago

If I understood it correctly, you need geonet_test_depth.py. There you specify a .txt file with images list and you get (depth) predictions for each of these images. To get actual images, just save it to image file instead of storing it to .npy.

xielinjiang commented 6 years ago

the depth of predictions for each of these images in the geonet_test_depth.py is not the true depth.

kristijanbartol commented 6 years ago

Ah, I see. That's a bit more tricky. What you get in KITTI dataset are point clouds (velodyne/ directory). You can generate disparity / depth maps from it, but personally I failed to do it.

xielinjiang commented 6 years ago

But in most application scenarios, there is no point clouds, so in that application scenarios we can‘t calculate the true depth. so······

kristijanbartol commented 6 years ago

In general, it is hard to obtain ground truth for spatial features (depth, optical flow). That's why unsupervised methods are so popular now...

tongpinmo commented 6 years ago

I have some doubts. Have you tried to use two images as input to predict depth or what's the advantages and significance of singe-view depth prediction? From the above discussion, there is no true depth for a single image. one more,I think that single view depth has more problems generalizing to previously unseen types of images. For example ,a model trained on outdoor driving sequences is unlikely to work well on indoor scenes .

kristijanbartol commented 6 years ago

There are two different approaches to depth estimation based on the available camera configuration:

Of course, you don't always have a calibrated camera system to feed stereo algorithms, so you are only left with a single image. It is actually a fascinating fact that you can use deep learning to estimate depth from single-image sequences (check this CVPR tutorial about structure-from-motion in classical, non-deep-learning sense).

I think that single view depth has more problems generalizing to previously unseen types of images.

This statement regards one of the fundamental questions of deep learning. This is true not only for a single view depth, but for many (almost every) other problem deep learning is trying to solve and you always tend to generalize well on different datasets / sets of problems... But this is really a generic question and answer, please find more relevant materials about deep learning.

From the above discussion, there is no true depth for a single image.

You are right, what you have, given a single image as input, are relative disparities / depth information. This is, again, because of mentioned scale ambiguity.

I hope I shed some light on your confusion and I warmly suggest this course to answer these and much more of your questions about computer vision.

yzcjtr commented 6 years ago

@kristijanbartol Thanks for your detailed and organized explanation. I learn a lot from your answer as well.

tongpinmo commented 6 years ago

@kristijanbartol thank you for your detailed explanation. I have learned a lot .Since there is scale ambiguity in single image ,why so many researchers do deep research in it? we can't apply it into the real scene like Automatic Driving,maybe it's a trying or we use deep learning method because of its scale ambiguity? if we can describe it in geometric method ,is it necessary for us to use deep learning?

kristijanbartol commented 6 years ago

Check this paper (size-to-depth). What I'm saying is that there are ways to recover depth from monocular images, but you need additional information, which is, in this paper, a size of a particular object in the image. Generally, this information is called prior knowledge.

if we can describe it in geometric method ,is it necessary for us to use deep learning?

Of course it's not necessary. We use it as it's proven to work well (better and faster than classical algorithms) with many interesting problems, including monocular "depth" estimation.

tongpinmo commented 6 years ago

ok ,it's very nice of you , I have learned a lot!