yangbincv / SDCL

MIT License
10 stars 2 forks source link

Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification

The official repository for Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification. We achieve state-of-the-art performances on unsupervised visible-infrared person re-identification task.


  1. We propose a shallow-deep collaborative learning framework based on the transformer architecture. This framework facilitates the learning of robust representation, effectively countering the cross-modality discrepancy through the collaboration of shallow and deep features.
  2. We propose a collaborative neighbor learning module to formulate dependable intra-modality and cross-modality neighbor learning, enabling the model to capture modality-invariant and discriminative features.
  3. We propose a collaborative ranking association module to explore intra-modality and cross-modality ranking consistencies, unifying the cross-modality labels and providing invaluable cross-modality supervision.
  4. Extensive experiments validate that our SDCL framework surpasses existing methods on two mainstream VI-ReID benchmarks, consistently improving the unsupervised cross-modality retrieval performance.

Prepare Datasets

Put SYSU-MM01 and RegDB dataset into data/sysu and data/regdb, run prepare_sysu.py and prepare_regdb.py to prepare the training data (convert to market1501 format).( See previous work ADCA or GUR. )

Prepare Pre-trained model

We adopt the self-supervised pre-trained models (ViT-B/16+ICS) from Self-Supervised Pre-Training for Transformer-Based Person Re-Identification. Download link:https://drive.google.com/file/d/1ZFMCBZ-lNFMeBD5K8PtJYJfYEk5D9isd/view


We utilize 2 A100 GPUs for training.



  1. Train:

    sh train_cc_vit_sysu.sh
  2. Test:

    sh test_cc_vit_sysu.sh


  1. Train: :

    sh train_cc_vit_regdb.sh
  2. Test:

    sh test_cc_vit_regdb.sh


This code is based on previous work ADCA. If you find this code useful for your research, please cite our papers.

  title={Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification},
  author={Yang, Bin and Chen, Jun and Ye, Mang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

  title={Dual Consistency-Constrained Learning for Unsupervised Visible-Infrared Person Re-Identification},
  author={Yang, Bin and Chen, Jun and Chen, Cuiqun and Ye, Mang},
  journal={IEEE Transactions on Information Forensics and Security},

    author    = {Yang, Bin and Chen, Jun and Ye, Mang},
    title     = {Towards Grand Unified Representation Learning for Unsupervised Visible-Infrared Person Re-Identification},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {11069-11079}

  title={Augmented Dual-Contrastive Aggregation Learning for Unsupervised Visible-Infrared Person Re-Identification},
  author={Yang, Bin and Ye, Mang and Chen, Jun and Wu, Zesen},
  pages = {2843–2851},
  booktitle = {ACM MM},

  title={Translation, association and augmentation: Learning cross-modality re-identification from single-modality annotation},
  author={Yang, Bin and Chen, Jun and Ma, Xianzheng and Ye, Mang},
  journal={IEEE Transactions on Image Processing},


Due to the different environments, the model may crash in the first epoch, resulting in poor results. You can increase the args.momentum parameter in the first epoch, and restore it to 0.1 in the second epoch.If you have any reproduction problems, welcome to contact us!


yangbin_cv@whu.edu.cn; yemang@whu.edu.cn.