Closed xielinjiang closed 6 years ago
If you are talking about using KITTI dataset, try eval_depth.py on KITTI test data, but change the code so that it saves ground truth disparities to images.
Thanks for your answer, how can I change the code?
Sorry, I mean is that generating depth map from pred_depth. Simply, input an image and how to output the depth of the image(KITTI dataset).
The multi-scale prediction is exactly self.pred_depth. It differs from the real-world depth (in real-world metric) by a scale factor which is unknown. See here for reference.
from the code: scalor = np.median(gt_depth[mask])/np.median(pred_depth[mask]) pred_depth[mask] *= scalor if we want to calculate the scalor, we must get the gt_depth, but if there is a unknown image without gt_depth, how do I get the depth of the unknown image?
The scale ambiguity can not be resolved unless you have prior knowledge, which is the case in most monocular SLAM methods.
Does that mean that in your paper, the depth shown in the image is not the true depth?
They are indeed the depth prediction by our method. They differ from the true depth by a relative scale. If you perform normalization operation, i.e. depth/np.mean(depth) for groundtruth or predicted depth, they should be similar.
If I understood it correctly, you need geonet_test_depth.py. There you specify a .txt file with images list and you get (depth) predictions for each of these images. To get actual images, just save it to image file instead of storing it to .npy.
the depth of predictions for each of these images in the geonet_test_depth.py is not the true depth.
Ah, I see. That's a bit more tricky. What you get in KITTI dataset are point clouds (velodyne/ directory). You can generate disparity / depth maps from it, but personally I failed to do it.
But in most application scenarios, there is no point clouds, so in that application scenarios we can‘t calculate the true depth. so······
In general, it is hard to obtain ground truth for spatial features (depth, optical flow). That's why unsupervised methods are so popular now...
I have some doubts. Have you tried to use two images as input to predict depth or what's the advantages and significance of singe-view depth prediction? From the above discussion, there is no true depth for a single image. one more,I think that single view depth has more problems generalizing to previously unseen types of images. For example ,a model trained on outdoor driving sequences is unlikely to work well on indoor scenes .
There are two different approaches to depth estimation based on the available camera configuration:
Of course, you don't always have a calibrated camera system to feed stereo algorithms, so you are only left with a single image. It is actually a fascinating fact that you can use deep learning to estimate depth from single-image sequences (check this CVPR tutorial about structure-from-motion in classical, non-deep-learning sense).
I think that single view depth has more problems generalizing to previously unseen types of images.
This statement regards one of the fundamental questions of deep learning. This is true not only for a single view depth, but for many (almost every) other problem deep learning is trying to solve and you always tend to generalize well on different datasets / sets of problems... But this is really a generic question and answer, please find more relevant materials about deep learning.
From the above discussion, there is no true depth for a single image.
You are right, what you have, given a single image as input, are relative disparities / depth information. This is, again, because of mentioned scale ambiguity.
I hope I shed some light on your confusion and I warmly suggest this course to answer these and much more of your questions about computer vision.
@kristijanbartol Thanks for your detailed and organized explanation. I learn a lot from your answer as well.
@kristijanbartol thank you for your detailed explanation. I have learned a lot .Since there is scale ambiguity in single image ,why so many researchers do deep research in it? we can't apply it into the real scene like Automatic Driving,maybe it's a trying or we use deep learning method because of its scale ambiguity? if we can describe it in geometric method ,is it necessary for us to use deep learning?
Check this paper (size-to-depth). What I'm saying is that there are ways to recover depth from monocular images, but you need additional information, which is, in this paper, a size of a particular object in the image. Generally, this information is called prior knowledge.
if we can describe it in geometric method ,is it necessary for us to use deep learning?
Of course it's not necessary. We use it as it's proven to work well (better and faster than classical algorithms) with many interesting problems, including monocular "depth" estimation.
ok ,it's very nice of you , I have learned a lot!
HI: Thank you for your work! I read your code, but I didn't find out how to get a real-world depth image. Looking forward to your reply. Thank you!