mattpfr / lmanet

GNU General Public License v3.0
8 stars 2 forks source link

LMANet: Local Memory Attention for Fast Video Semantic Segmentation (IROS 2021)

LMANet is fast and general video semantic segmentation pipeline written in PyTorch that can convert an existing single frame model to a video pipeline.

alt text Authors: Matthieu Paul, Martin Danelljan, Luc Van Gool, Radu Timofte

[Paper] | [Results Video]

Requirements:

Datasets

Cityscapes

Packages

The code was tested with Python 3.8 together with PyTorch 1.9.0 with CUDA 11.1.

Additional packages required: numpy, matplotlib, pillow, visdom, opencv, cupy, tqdm

In Anaconda you can install them with:

conda install numpy matplotlib torchvision Pillow tqdm
conda install -c conda-forge visdom
conda install -c conda-forge opencv
conda install -c conda-forge cupy cudatoolkit=11.1

Quick Start

Setting up data and paths

Pretrained models

Download the following pretrained models and copy them into trained_models/. Note that you have to create the folder when first clone the repository.

Datasets path

Setup the path to the root folder of Cityscapes in datasets/cityscapes.py.

CITYSCAPES_ROOT = "/path/to/cityscapes/root/containing/folder/leftImg8bit/"
CITYSCAPES_SEQ_ROOT = "/path/to/cityscapes/root/containing/folder/leftImg8bit_sequence/"

Most likely, those paths will be identical.

Choosing options

For further options and defaults please see the bottom of the main.py file.

Evaluation examples

Here are the commands to simply generate the results with the provided pretrained weights for both ERFNet and PSPNet backbones.

Example evaluating LMANet with ERFNet:

python main.py --eval-mode --weights trained_models/lmanet_erf_best.pth --backbone erf \
--corr-size 21 --stm-queue-size 4 \
--savedir save/eval_lma_erf

Example evaluating LMANet with PSPNet:

python main.py --eval-mode --weights trained_models/lmanet_psp101_best.pth --backbone psp101 \
--corr-size 21 --stm-queue-size 3 --fusion-strategy sigmoid-down1 \
--savedir save/eval_lma_psp101

Training examples

Example training LMANet with ERFNet:

Here is a simple example of training with ERFNet as backbone, setting more options and running on two gpus. More training related parameters are set compared to the evaluation mode.

export CUDA_VISIBLE_DEVICES=0,1; \
python main.py --weights trained_models/erfnet_pretrained.pth --backbone erf \
--training --gpus 2 --lr-start 2e-4 --lr-strategy plateau_08 --num-epochs 50 --batch-size 5 \
--corr-size 21 --stm-queue-size 3 --fusion-strategy sigmoid-do1 --memory-strategy all \
--savedir save/training_lma_erf

Example training LMANet with PSPNet:

Here is a simple example of training with PSPNet as backbone, setting more options and running on six gpus.

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5; \
python main.py --weights trained_models/pspnet101/model/train_epoch_200.pth --backbone psp101 \
--training --gpus 6 --lr-start 2e-4 --lr-strategy plateau_08 --num-epochs 50 --batch-size 1 \
--corr-size 21 --stm-queue-size 3 --fusion-strategy sigmoid-down1 \
--savedir save/training_lma_psp101

Visualization

In order to visualize debug output (images, predictions and graphs), make sure that visdom is running by launching visdom and adding the --visualize flag. This flag will work both for training and evaluation modes. For instance:

python main.py --eval-mode --weights trained_models/lmanet_erf_best.pth --backbone erf \
--corr-size 21 --stm-queue-size 4 \
--savedir save/eval_lma_erf --visualize

The visualization will be available in your web browser, by default at http://localhost:8097/

Note that graphs and images will be in separate visdom namespaces holding the name of your experiment. This name is defined by the last folder of your savedir. In this example, the related namespaces would contain eval_lma_erf

Output files generated by default for each run:

The following files are generated in the specified --savedir for each run:

Additional training files generated

Each training will create additional files

Citation

Local Memory Attention for Fast Video Semantic Segmentation, IROS 2021.

If you use this repository for your research, please cite our publication:

@inproceedings{locattseg,
  author    = {Matthieu Paul and
               Martin Danelljan and
               Luc Van Gool and
               Radu Timofte},
  title     = {Local Memory Attention for Fast Video Semantic Segmentation},
  booktitle = {{IEEE/RSJ} International Conference on Intelligent Robots and Systems, {IROS}},
  year      = {2021},
  url       = {https://arxiv.org/abs/2101.01715}
}

Acknowledgement

This repository was originally built from ERFNet. It was modified and extended to support video pipelining.

It also borrows code and models from other repositories such as PSPNet and PyTracking.