haofengac / MonoDepth-FPN-PyTorch

Single Image Depth Estimation with Feature Pyramid Network
MIT License
326 stars 69 forks source link
depth-estimation depth-map depth-prediction feature-pyramid-network kitti-dataset monocular-depth nyu-depth pytorch

MonoDepth-FPN-PyTorch

License

A simple end-to-end model that achieves state-of-the-art performance in depth prediction implemented in PyTorch. We used a Feature Pyramid Network (FPN) backbone to estimate depth map from a single input RGB image. We tested the performance of our model on the NYU Depth V2 Dataset (Official Split) and the KITTI Dataset (Eigen Split).

Requirements

To Run

python3 main_fpn.py --cuda --bs 6

To continue training from a saved model, use

python3 main_fpn.py --cuda --bs 6 --r True --checkepoch 10

To visualize the reconstructed data, run the jupyter notebook in vis.ipynb.

Data Processing

NYU Depth V2 Dataset

KITTI Dataset

Model

Loss Function

We employed three parts in the loss function in our model. The loss is a weighted sum of 3 parts: the depth loss, the gradient loss and the surface normal loss.

Depth Loss

img

The depth loss is RMSE in log scale, which we found converges better than L1 and L2 norm. Supervising in log scale makes the classifier focus more on closer objects.

Gradient Loss

img

The gradient of depth maps is obtained by a Sobel filter; the gradient loss is the L1 norm of the difference.

Surface Normal Loss

img We also employed the normal vector loss proposed by Hu et al., which helps refining details.

The weight ratio between the three loss was set to 1:1:1.

Qualitative Evaluation

KITTI

Comparison with state-of-the-art methods:

More comparison:

Quantitative Evaluation

KITTI

We use the following depth evaluation metrics used by Eigen et al.:

where T is the number of valid pixels in the test set.

Discussion

Related Work

References