EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details are described in our paper:
Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation
Rui Peng, Ronggang Wang, Yawen Lai, Luyang Tang, Yangang Cai
ICCV 2021 (arxiv)
EPCDepth can produce the most accurate and sharpest result. In the last example, the depth of the person in the second red box should be greater than that of the road sign because the road sign obscures the person. Only our model accurately captures the cue of occlusion.
You can download the raw KITTI dataset (about 175GB) by running:
wget -i dataset/kitti_archives_to_download.txt -P <your kitti path>/
cd <your kitti path>
unzip "*.zip"
Then, we recommend that you converted the png images to jpeg with this command:
find <your kitti path>/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'
or you can skip this conversion step and by manually adjusting the suffix of the image from .jpg
to .png
in dataset/kitti_dataset.py
. Our pre-trained model is trained in jpg, and the test performance on png will slightly decrease.
Once you have downloaded the KITTI dataset as in the previous step, you need to prepare the depth hint by running:
python precompute_depth_hints.py --data_path <your kitti path>
the generated depth hint will be saved to <your kitti path>/depth_hints
. You should also pay attention to the suffix of the image.
Download our pretrained model and put it to <your model path>
.
Pre-trained | PP | HxW | Backbone | Output Scale | Abs Rel | Sq Rel | RMSE | Ξ΄ < 1.25 |
model18_lr | β | 192x640 | resnet18 (pt) | d0 | 0.0998 | 0.722 | 4.475 | 0.888 |
d2 | 0.1 | 0.712 | 4.462 | 0.886 | ||||
model18 | β | 320x1024 | resnet18 (pt) | d0 | 0.0925 | 0.671 | 4.297 | 0.899 |
d2 | 0.0920 | 0.655 | 4.268 | 0.898 | ||||
model50 | β | 320x1024 | resnet50 (pt) | d0 | 0.0905 | 0.646 | 4.207 | 0.901 |
d2 | 0.0905 | 0.629 | 4.187 | 0.900 |
Note: pt
refers to pre-trained on ImageNet, and the results of low resolution are a bit different from the paper.
This operation will save the estimated disparity map to <your disparity save path>
. To recreate the results from our paper, run:
python main.py
--val --data_path <your kitti path> --resume <your model path>/model18.pth.tar
--use_full_scale --post_process --output_scale 0 --disps_path <your disparity save path>
The shape of saved disparities in numpy data format is (N, H, W)
.
We validate the generalization ability on the NYU-Depth-V2 dataset using the mode trained on the KITTI dataset. Download the testing data nyu_test.tar.gz, and unzip it to <your nyuv2 testing date path>
. All evaluation codes are in the nyuv2Testing
folder. Run:
python nyuv2_testing.py
--data_path <your nyuv2 testing date path>
--resume <your mode path>/model50.pth.tar --post_process
--save_dir <your nyuv2 disparity save path>
By default, only the visualization results (png format) of the predicted disparity and ground-truth will be saved to <your nyuv2 disparity save path>
on NYUv2 dataset.
You can download our precomputed disparity predictions from the following links:
Disparity | PP | HxW | Backbone | Output Scale | Abs Rel | Sq Rel | RMSE | Ξ΄ < 1.25 |
disps18_lr | β | 192x640 | resnet18 (pt) | d0 | 0.0998 | 0.722 | 4.475 | 0.888 |
disps18 | β | 320x1024 | resnet18 (pt) | d0 | 0.0925 | 0.671 | 4.297 | 0.899 |
disps50 | β | 320x1024 | resnet50 (pt) | d0 | 0.0905 | 0.646 | 4.207 | 0.901 |
To visualize the disparity map saved in the KITTI evaluation (or other disparities in numpy data format), run:
python main.py --vis --disps_path <your disparity save path>/disps50.npy
The visualized depth map will be saved to <your disparity save path>/disps_vis
in png format.
To train the model from scratch, run:
python main.py
--data_path <your kitti path> --model_dir <checkpoint save dir>
--logs_dir <tensorboard save dir> --pretrained --post_process
--use_depth_hint --use_spp_distillation --use_data_graft
--use_full_scale
If you find our work useful in your research please consider citing our paper:
@inproceedings{epcdepth,
title = {Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation},
author = {Peng, Rui and Wang, Ronggang and Lai, Yawen and Tang, Luyang and Cai, Yangang},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
year = {2021}
}
Our depth hint module refers to DepthHints, the NYUv2 pre-processing refers to P2Net, and the RSU block refers to U2Net.