moothes / A2S-v2

A more robust Unsupervised Salient Object Detection (USOD) framework.
MIT License
45 stars 4 forks source link

Activation-to-Saliency version2 (A2S-v2)

An excellent work A2S-v3 is accepted by IJCAI 2024! You are welcome to check the latest code of A2S-v3 for further contributions.

The naming convention is open to the community, e.g., A2S-v4, as long as they 1) are published in top-tier conferences or journals and 2) don't conflict with other works.

Source code of our CVPR 2023 paper: "Texture-guided Saliency Distilling for Unsupervised Salient Object Detection".
This work is an improved method of our previous Activation-to-Saliency (A2S-v1) published in TCSVT 2023.
These two works are based on SOD benchmark.


You can download the pre-trained MoCo-v2 weight and all trained weights of our method.
RGB SOD results: pseudo labels and saliency maps.
Results on other multimodal SOD datasets can be easily generated using our code.

Training & Testing


For convenience, we re-organize the prevalent datasets used in SOD tasks.
Stage 1 network Stage 2 network Training sets Test sets
RGB a2s | cornet | [cr] DUTS-TR or MSB-TR | [ce] HKU-IS, PASCAL-S, ECSSD, DUTS-TE, DUT-OMRON, MSB-TE
RGB-D a2s | midnet | [dr] RGBD-TR or RGBD-TR-2985 | [de] DUT, LFSD, NJUD, NLPR, RGBD135, SIP, SSD, STERE1000, STEREO
RGB-T a2s | midnet | [tr] VT5000-TR | [te] VT821, VT1000 and VT5000-TE
Video a2s | midnet | [or] VSOD-TR | [oe] SegV2, FBMS, DAVIS-TE, DAVSOD-TE

Networks a2s and cornet are inherited from our previous A2S-v1 and midnet is from here.
MSB-TR and MSB-TE are the train+val and test splits of the MSRA-B dataset.
RGBD-TR (2185 samples, default) and RGBD-TR-2985 (2985 samples) are two different training sets for RGB-D SOD task.
VT5000-TR and VT5000-TE are the train and test splits of the VT5000 dataset.
VSOD-TR is the collection of the train splits of the DAVIS and DAVSOD datasets.


--vals has two characters that define the datasets used for testing.
First character (task): RGB[c], RGB-D[d], RGB-T[t], and video[o];
Second character (phase): training[r] or test[e] sets.
--trset defines the training sets of different tasks, the same as the first character of --vals.
More details please refer to

Stage 1

 ## Training
 # Training for RGB SOD task
 python3 a2s --gpus=[gpu_num] --trset=c

 # Split training for single multimodal task
 python3 a2s --gpus=[gpu_num] --trset=[d/o/t]

 # Joint training for four multimodal tasks
 python3 a2s --gpus=[gpu_num] --trset=cdot

 ## Testing
 # Generating pseudo labels
 python3 a2s --gpus=[gpu_num] --weight=[path_to_weight] --vals=[cr/dr/or/tr] --save --crf

 # Testing on test sets
 python3 a2s --gpus=[gpu_num] --weight=[path_to_weight] --vals=[ce/de/oe/te] [--save]

After the training process in stage 1, we will generate pseudo labels for all training sets and save them to a new pseudo folder.

Stage 2

 ## Training
 # Training for RGB SOD task
 python3 cornet --gpus=[gpu_num] --stage=2 --trset=c --vals=ce

 # Training for RGB-D, RGB-T or video SOD tasks
 python3 midnet --gpus=[gpu_num] --stage=2 --trset=[d/o/t] --vals=[de/oe/te]

 ## Testing
 python3 [cornet/midnet] --gpus=[gpu_num] --weight=[path_to_weight] --vals=[de/oe/te] [--save]


Thanks for citing our serial works.

  title={Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection},
  author={Zhou, Huajun and Qiao, Bo and Yang, Lingxiao and Lai, Jianhuang and Xie, Xiaohua},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

  title={Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection}, 
  author={Zhou, Huajun and Chen, Peijia and Yang, Lingxiao and Xie, Xiaohua and Lai, Jianhuang},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 

