yuantn / MI-AOD

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021
https://openaccess.thecvf.com/content/CVPR2021/papers/Yuan_Multiple_Instance_Active_Learning_for_Object_Detection_CVPR_2021_paper.pdf
Apache License 2.0
333 stars 43 forks source link
active-learning cvpr cvpr2021 mil multiple-instance-learning object-detection

MI-AOD

Language: 简体中文 | English

Python 3.7 PyTorch 1.6 CUDA 10.2 cuDNN 7.6.5 LICENSE

PWC PWC

Average time to resolve an issue Percentage of issues still open

Introduction

This is the code for Multiple Instance Active Learning for Object Detection, CVPR 2021.

Task Description

In this paper, we propose Multiple Instance Active Object Detection (MI-AOD), to select the most informative images for detector training by observing instance-level uncertainty.

The process of active object detection (active learning for object detection) is shown in the figure below.

Task

First, a small set of images X_L^0 (the labeled set) with instance labels Y_L^0 and a large set of images X_U^0 (the unlabeled set) without labels are given. For each image, the label consists of bounding boxes y_x^loc and categories y_x^cls for objects of interest.

A detection model M_0 is firstly initialized by using the labeled set {{X_L^0, Y_L^0}}. With the initialized model M_0, active learning targets at selecting a set of images X_S^0 from X_U^0 to be manually labeled and merging them with X_L^0 for a new labeled set X_L^1, i.e., X_L^1 = X_L^0 \union X_S^0. The selected image set X_S^0 should be the most informative, i.e., can improve the detection performance as much as possible.

The informativeness in the figure above is embodied as the uncertainty. That is to say, with the sample in X_U^0 input into the current model, if the output score of the model for each class is more uniform, the uncertainty of this sample is higher.

Based on the updated labeled set X_L^1, the task model is retrained and updated to M_1. The model training and sample selection repeat some cycles until the size of labeled set reaches the annotation budget.

Illustration

MI-AOD defines an instance uncertainty learning module, which leverages the discrepancy of two adversarial instance classifiers trained on the labeled set to predict instance uncertainty of the unlabeled set. MI-AOD treats unlabeled images as instance bags and feature anchors in images as instances, and estimates the image uncertainty by re-weighting instances in a multiple instance learning (MIL) fashion. Iterative instance uncertainty learning and re-weighting facilitate suppressing noisy instances, toward bridging the gap between instance uncertainty and image-level uncertainty.

Here and here are more paper interpretation in Chinese.

Illustration

Architecture

Innovation

Results

Boarder Impact

MI-AOD focuses on object detection (OD), but it can be also generalized to:

by combining active learning with these tasks. This bottom-up and top-down idea can be generalized and applied to any of these tasks.

Notice that active learning works a lot for visual object detection in MI-AOD, other learning methods with less supervision can be combined with it, such as:

and so on. These combination of active learning and other learning method can promote each other to a greater extent.

Getting Started

Installation

Please refer to Installation.md for installation.

Data Preparation

Please download VOC2007 datasets ( trainval + test ) and VOC2012 datasets ( trainval ) from:

VOC2007 ( trainval ): http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar

VOC2007 ( test ): http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar

VOC2012 ( trainval ): http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

And after that, please ensure the file directory tree is as below:

├── VOCdevkit
│   ├── VOC2007
│   │   ├── Annotations
│   │   ├── ImageSets
│   │   ├── JPEGImages
│   ├── VOC2012
│   │   ├── Annotations
│   │   ├── ImageSets
│   │   ├── JPEGImages

You may also use the following commands directly:

cd $YOUR_DATASET_PATH
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar -xf VOCtrainval_06-Nov-2007.tar
tar -xf VOCtest_06-Nov-2007.tar
tar -xf VOCtrainval_11-May-2012.tar

If you want to use SSD detectors instead of RetinaNet detectors, you can replace the configuration file in this repository as below:

mv configs/MIAOD.py configs/MIAOD_Retina.py
mv configs/MIAOD_SSD.py configs/MIAOD.py

For the SSD detector, because the vgg16 pre-trained model link provided in the mmcv 1.0.5 package is no longer available, the json file that loads the pre-trained model link needs to be updated to the latest version:

wget https://github.com/open-mmlab/mmcv/raw/master/mmcv/model_zoo/open_mmlab.json
cp -v open_mmlab.json $YOUR_ANACONDA_PATH/envs/miaod/lib/python3.7/site-packages/mmcv/model_zoo/

Please change the $YOUR_ANACONDA_PATH to your actual Anaconda3 installation directory. Usually it would be ~/anaconda3.

After that, please modify the corresponding dataset directory, they are located in:

Line 2 of configs/MIAOD.py: data_root='$YOUR_DATASET_PATH/VOCdevkit/'
Line 2 of configs/_base_/voc0712.py: data_root='$YOUR_DATASET_PATH/VOCdevkit/'

Please change the $YOUR_DATASET_PATHs above to your actual dataset directory (i.e., the directory where you intend to put the downloaded VOC tar file).

And please use the absolute path (i.e., start with /) but not a relative path (i.e., start with ./ or ../).

Please refer to here for the information of other variables and parameters.

Please refer to here for the data preparation on MS COCO.

Train and Test

We recommend you to use a GPU but not a CPU to train and test, because it will greatly shorten the time.

And we also recommend you to use a single GPU, because the usage of multi-GPU may result in errors caused by the multi-processing of the dataloader.

However, thanks to @Kevin Chow, here is a feasible solution to train on multiple GPUs.

If you use only a single GPU, you can use the script.sh file directly as below:

chmod 700 ./script.sh
./script.sh $YOUR_GPU_ID

Please change the $YOUR_GPU_ID above to your actual GPU ID number (usually a non-negative number).

Please ignore the error if you run the script.sh file for the first time:

rm: cannot remove './log_nohup/nohup_$YOUR_GPU_ID.log': No such file or directory

The script.sh file will use the GPU with the ID number $YOUR_GPU_ID and PORT (30000+$YOUR_GPU_ID*100) to train and test.

The log file will not flush in the terminal, but will be saved and updated in the file log_nohup/nohup_$YOUR_GPU_ID.log and work_dirs/MI-AOD/$TIMESTAMP.log . These two logs are the same. You can change the directories and names of the latter log files in Line 48 of configs/MIAOD.py .

If you want to flush the log in the terminal, you can run these commands instead of using script.sh:

# for single GPU
python tools/train.py $CONFIG_PATH

# for multiple GPUs
tools/dist_train.sh $CONFIG_PATH $GPU_NUMBERS

where $CONFIG_PATH should be replaced by the path of the config file in the configs folder (usually it would be configs/MIAOD.py) and $GPU_NUMBERS should be replaced by the total numbers of used GPUs (it is not GPU ID number).

Similarly, theses commands are for test:

# for single GPU
python tools/test.py $CONFIG_PATH $CKPT_PATH --eval mAP

# for multiple GPUs
tools/dist_test.sh $CONFIG_PATH $CKPT_PATH $GPU_NUMBERS --eval mAP

where $CKPT_PATH should be replaced by the path of the checkpoint file (*.pth) in the work_dirs folder after training.

If you want to inference on single image, the commands are as follows:

python tools/test_single.py $CONFIG_PATH $CKPT_PATH $IMG_PATH $OUT_NAME

where $IMG_PATH should be replaced by the image on which you want to inference on, and $OUT_NAME should be replaced by the output result file name, which should usually end with .jpg, .png, and so on.

With output of an image with bounding boxes and scores, the uncertainty of this image will also be flushed in the terminal.

If you have any question, please feel free to leave a issue here.

And please refer to FAQ for frequently asked questions.

Code Structure

├── $YOUR_ANACONDA_DIRECTORY
│   ├── anaconda3
│   │   ├── envs
│   │   │   ├── miaod
│   │   │   │   ├── lib
│   │   │   │   │   ├── python3.7
│   │   │   │   │   │   ├── site-packages
│   │   │   │   │   │   │   ├── mmcv
│   │   │   │   │   │   │   │   ├── runner
│   │   │   │   │   │   │   │   │   ├── epoch_based_runner.py
│
├── ...
│
├── configs
│   ├── _base_
│   │   ├── default_runtime.py
│   │   ├── retinanet_r50_fpn.py
│   │   ├── voc0712.py
│   ├── MIAOD.py
│── log_nohup
├── mmdet
│   ├── apis
│   │   ├── __init__.py
│   │   ├── inference.py
│   │   ├── test.py
│   │   ├── train.py
│   ├── models
│   │   ├── dense_heads
│   │   │   ├── __init__.py
│   │   │   ├── MIAOD_head.py
│   │   │   ├── MIAOD_retina_head.py
│   │   │   ├── base_dense_head.py 
│   │   ├── detectors
│   │   │   ├── base.py
│   │   │   ├── single_stage.py
│   ├── utils
│   │   ├── active_datasets.py
├── tools
│   ├── test.py
│   ├── test_single.py
│   ├── train.py
├── work_dirs
│   ├── MI-AOD
├── script.sh

The code files and folders shown above are the main part of MI-AOD, while other code files and folders are created following MMDetection V2.3.0 to avoid potential problems.

The explanation of each code file or folder is as follows:

Model Zoo

Models

The trained model for the last cycle in active learning (i.e., using 20% labeled samples) are available on Google Drive and Baidu Drive (Extraction code: 1y9x).

Results

Results_RetinaNet_VOC

Proportion (%) of Labeled Images 5.0 7.5 10.0 12.5 15.0 17.5 20.0 100.0 (Full supervision)
mAP (%) of MI-AOD 47.18 58.41 64.02 67.72 69.79 71.07 72.27 77.28
Ratio (%) of the performance to full supervision 61.05 75.58 82.84 87.63 90.31 91.96 93.52 100.00

The training and test logs are available on Google Drive and Baidu Drive (Extraction code: 7a6m).

You can also use other files in the directory work_dirs/MI-AOD/ if you like, they are as follows:

An example output folder is provided on Google Drive and Baidu Drive (Extraction code: ztd6), including the log file, the last trained model, and all other files above.

Repository Contributor

In this repository, we reimplemented RetinaNet on PyTorch based on mmdetection. Thanks for their contribution.

License

This project is released under the Apache 2.0 license.

Citation

If you find this repository useful for your publications, please consider citing our paper.

@inproceedings{MIAOD2021,
    author    = {Tianning Yuan and
                 Fang Wan and
                 Mengying Fu and
                 Jianzhuang Liu and
                 Songcen Xu and
                 Xiangyang Ji and
                 Qixiang Ye},
    title     = {Multiple Instance Active Learning for Object Detection},
    booktitle = {CVPR},
    year      = {2021}
}

Stargazers repo roster for @yuantn/MI-AOD