zyw-stu / CPA-Enhancer

This is the official repository of the paper: CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations
Apache License 2.0
40 stars 1 forks source link

CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations

πŸ“° ArXiv Preprint: Arxiv 2403.11220

βœ… Updates

πŸš€ Overview

Overall Workflow of the CPA-Enhancer Framework
Overview of the proposed CPA-Enhancer.

Overall Workflow of the CPA-Enhancer Framework
Our proposed content-driven prompt block (CPB).

Abstract : Object detection methods under known single degradations have been extensively investigated. However, existing approaches require prior knowledge of the degradation type and train a separate model for each, limiting their practical applications in unpredictable environments. To address this challenge, we propose a chain-of-thought (CoT) prompted adaptive enhancer, CPA-Enhancer, for object detection under unknown degradations. Specifically, CPA-Enhancer progressively adapts its enhancement strategy under the step-by-step guidance of CoT prompts, that encode degradation-related information. To the best of our knowledge, it’s the first work that exploits CoT prompting for object detection tasks. Overall, CPA-Enhancer is a plug-and-play enhancement model that can be integrated into any generic detectors to achieve substantial gains on degraded images, without knowing the degradation type priorly. Experimental results demonstrate that CPA-Enhancer not only sets the new state of the art for object detection but also boosts the performance of other downstream vision tasks under multiple unknown degradations.

πŸ› οΈ Installation

conda create --name openmmlab python=3.8 -y
conda activate openmmlab
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
cd CPA_Enhancer
pip install -r ./cpa/requirements.txt

πŸ“ Data Preparation

Synthetic Datasets

$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
# 5 class
target_classes = ['person','car','bus','bicycle','motorbike']
# 10 class
target_classes = ['bicycle','boat','bottle','bus','car','cat','chair','dog','motorbike','person']

Make sure the directory follows this basic VOC structure.

data_vocnorm  (data_vocnorm_10)     # path\to\vocnorm
β”œβ”€β”€ train   # VnA-T (VnB-T)      
|    β”œβ”€β”€ Annotations
|    |    └──xxx.xml
|    |       ...
|    └── ImageSets
|    |    └──Main
|    |        └──train_voc.txt  # you can find it in cpa\dataSyn\datalist
|    └── JPEGImages
|         └──xxx.jpg
|            ...
β”œβ”€β”€ test  # VnA (VnB)        
|    β”œβ”€β”€ Annotations
|    |    └──xxx.xml
|    |       ...
|    └── ImageSets
|    |    └──Main
|    |        └──test_voc.txt # you can find it in cpa\dataSyn\datalist
|    └── JPEGImages
|         └──xxx.jpg
|            ...
# Modify the paths in the code to match your actual paths.
# all-in-one setting 
python cpa/dataSyn/data_make_fog.py         # VF/VF-T 
python cpa/dataSyn/data_make_lowlight.py    # VD/VD-T/VDB
python cpa/dataSyn/data_make_snow.py        # VS/VS-T
python cpa/dataSyn/data_make_rain.py        # VR/VR-T
# one-by-one setting 
python cpa/dataSyn/data_make_fog_hybrid.py          # VF-HT
python cpa/dataSyn/data_make_lowlight_hybrid.py     # VD-HT

Real-world Datasets

RTTS          # path\to\RTTS
β”œβ”€β”€ Annotations
|    └──xxx.xml
|       ...
└── ImageSets
|    └──Main
|        └──test_rtts.txt
└── JPEGImages
     └──xxx.jpg
        ...
exdark_5 (exdark_10)         #  path\to\ExDarkA (ExDarkB)
β”œβ”€β”€ Annotations
|    └──xxx.xml
|       ...
└── ImageSets
|    └──Main
|        └──test_exdark_5.txt (test_exdark_10.txt) # you can find it in cpa\dataSyn\datalist
└── JPEGImages
     └──xxx.jpg
        ...

🎯 Usage

πŸ“ All-in-One Setting

METAINFO = {
        'classes': ('person', 'car', 'bus', 'bicycle',  'motorbike'), # 5 classes
        'palette': [(106, 0, 228), (119, 11, 32), (165, 42, 42), (0, 0, 192),(197, 226, 255)]
    }
def voc_classes() -> list:
    return [
        'person', 'car', 'bus', 'bicycle',  'motorbike' # 5 classes
    ]
bbox_head=dict(
        type='YOLOV3Head',
        num_classes=5, # 5 classes
                ...
)
cd CPA_Enhancer
pip install -v -e .

The pretrained models and training/testing logs can be found in checkpoint.zip

πŸ”Ή Train

# Train our model from scratch.  
python tools/train.py configs/yolo/cpa_config.py  

πŸ”Ή Test

# you can download our pretrained model for testing 
python tools/test.py configs/yolo/cpa_config.py path/to/checkpoint/xx.pth

πŸ”Ή Demo

# you can download our pretrained model for inference
python demo/cpa_demo.py \
    --inputs ../cpa/testimage  # path to your input images or dictionary
    --model ../configs/yolo/cpa_config.py 
    --weights path/to/checkpoint/xx.pth 
    --out-dir ../cpa/output # output file

πŸ“ One-by-One Setting

For the foggy conditions (containing five categories), the overall process is the same as above (Step1-5).

For the low-light conditions ( containing ten categories ) , You only need to modify a few places as follows (Step1-3).

# 10 classes
METAINFO = {
        'classes': ('bicycle', 'boat', 'bottle','bus', 'car', 'cat', 'chair','dog','motorbike','person'),
        'palette': [(106, 0, 228), (119, 11, 32), (165, 42, 42), (0, 0, 192),(197, 226, 255),
                                        (0, 60, 100), (0, 0, 142), (255, 77, 255), (153, 69, 1), (120, 166, 157),]
    }
def voc_classes() -> list:
    return [
        'bicycle', 'boat', 'bottle','bus', 'car', 'cat', 'chair','dog','motorbike','person' # 10 classes
    ]
bbox_head=dict(
        type='YOLOV3Head',
        num_classes=10, # 10 classes
                ...
)

πŸ“Š Results

Quantitative results

Overall Workflow of the CPA-Enhancer Framework
Quantitative comparisons under the all-in-one setting.

Image 1 Image 2

Comparisons in the one-by-one setting under the foggy degradation (left) and low-light degradation (right)

Visual Results

Overall Workflow of the CPA-Enhancer Framework
Visual comparisons of CPA-Enhancer under the all-in-one setting.

πŸ’ Acknowledgments

Special thanks to the creators of mmdetection upon which this code is built, for their valuable work in advancing object detection research.

πŸ”— Citation

If you use this codebase, or CPA-Enhancer inspires your work, we would greatly appreciate it if you could star the repository and cite it using the following BibTeX entry.

@misc{zhang2024cpaenhancer,
      title={CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations}, 
      author={Yuwei Zhang and Yan Wu and Yanming Liu and Xinyue Peng},
      year={2024},
      eprint={2403.11220},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}