PyTorch implementation of our paper:
AANet: Adaptive Aggregation Network for Efficient Stereo Matching, CVPR 2020
Authors: Haofei Xu and Juyong Zhang
11/15/2022 Update: Check out our new work: Unifying Flow, Stereo and Depth Estimation and code: unimatch for performing stereo matching with our new GMStereo model. The CUDA op in AANet is no longer required. 10 pretrained GMStereo models with different speed-accuracy trade-offs are also released. Check out our Colab and HuggingFace demo to play with GMStereo in your browser!
We propose a sparse points based intra-scale cost aggregation (ISA) module and a cross-scale cost aggregation (CSA) module for efficient and accurate stereo matching.
The implementation of improved version AANet+ (stronger performance & slightly faster speed) is also included in this repo.
Modular design
We decompose the end-to-end stereo matching framework into five components:
feature extraction, cost volume construction, cost aggregation, disparity computation and disparity refinement.
One can easily construct a customized stereo matching model by combining different components.
High efficiency
Our method can run at 60ms for a KITTI stereo pair (384x1248 resolution)!
Full framework
All codes for training, validating, evaluating, inferencing and predicting on any stereo pair are provided!
Our code is based on PyTorch 1.2.0, CUDA 10.0 and python 3.7.
We recommend using conda for installation:
conda env create -f environment.yml
After installing dependencies, build deformable convolution:
cd nets/deform_conv && bash build.sh
Download Scene Flow, KITTI 2012 and KITTI 2015 datasets.
Our folder structure is as follows:
data
├── KITTI
│ ├── kitti_2012
│ │ └── data_stereo_flow
│ ├── kitti_2015
│ │ └── data_scene_flow
└── SceneFlow
├── Driving
│ ├── disparity
│ └── frames_finalpass
├── FlyingThings3D
│ ├── disparity
│ └── frames_finalpass
└── Monkaa
├── disparity
└── frames_finalpass
If you would like to use the pseudo ground truth supervision introduced in our paper, you can download the pre-computed disparity on KITTI 2012 and KITTI 2015 training set here: KITTI 2012, KITTI 2015.
For KITTI 2012, you should place the unzipped file disp_occ_pseudo_gt
under kitti_2012/data_stereo_flow/training
directory.
For KITTI 2015, you should place disp_occ_0_pseudo_gt
under kitti_2015/data_scene_flow/training
.
It is recommended to symlink your dataset root to $AANET/data
:
ln -s $YOUR_DATASET_ROOT data
Otherwise, you may need to change the corresponding paths in the scripts.
All pretrained models are available in the model zoo.
We assume the downloaded weights are located under the pretrained
directory.
Otherwise, you may need to change the corresponding paths in the scripts.
To generate prediction results on the test set of Scene Flow and KITTI dataset, you can run scripts/aanet_inference.sh.
The inference results on KITTI dataset can be directly submitted to the online evaluation server for benchmarking.
We also support predicting on any rectified stereo pairs. scripts/aanet_predict.sh provides an example usage.
All training scripts on Scene Flow and KITTI datasets are provided in scripts/aanet_train.sh.
Note that we use 4 NVIDIA V100 GPUs (32G) with batch size 64 for training, you may need to tune the batch size according to your hardware.
We support using tensorboard to monitor and visualize the training process. You can first start a tensorboard session with
tensorboard --logdir checkpoints
and then access http://localhost:6006 in your browser.
How to train on my own data?
You can first generate a filename list by creating a data reading function in filenames/generate_filenames.py (an example on KITTI dataset is provided), and then create a new dataset dictionary in dataloader/dataloader.py.
How to develop new components?
Our framework is flexible to develop new components, e.g., new feature extractor, cost aggregation module or refinement architecture. You can 1) create a new file (e.g., my_aggregation.py
) under nets
directory, 2) import the module in nets/aanet.py
and 3) use it in the model definition.
To enable fast experimenting, evaluation runs on-the-fly without saving the intermediate results.
We provide two types of evaluation setting:
Check scripts/aanet_evaluate.sh for an example usage.
If you find our work useful in your research, please consider citing our paper:
@inproceedings{xu2020aanet,
title={AANet: Adaptive Aggregation Network for Efficient Stereo Matching},
author={Xu, Haofei and Zhang, Juyong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={1959--1968},
year={2020}
}
Part of the code is adopted from previous works: PSMNet, GwcNet and GA-Net. We thank the original authors for their awesome repos. The deformable convolution op is taken from mmdetection. The FLOPs counting code is modified from pytorch-OpCounter. The code structure is partially inspired by mmdetection and our previous work rdn4depth.