Peng Hu, Hongyuan Zhu, Jie Lin, Dezhong Peng, Yin-Ping Zhao, Xi Peng*,Unsupervised Contrastive Cross-modal Hashing, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, no. 3, pp. 3877-3889, 1 March 2023, doi: 10.1109/TPAMI.2022.3177356. (PyTorch Code)
In this paper, we study how to make unsupervised cross-modal hashing (CMH) benefit from contrastive learning (CL) by overcoming two challenges. To be exact, i) to address the performance degradation issue caused by binary optimization for hashing, we propose a novel momentum optimizer that performs hashing operation learnable in CL, thus making on-the-shelf deep cross-modal hashing possible. In other words, our method does not involve binary-continuous relaxation like most existing methods, thus enjoying better retrieval performance; ii) to alleviate the influence brought by false-negative pairs (FNPs), we propose a Cross-modal Ranking Learning loss (CRL) which utilizes the discrimination from all instead of only the hard negative pairs, where FNP refers to the within-class pairs that were wrongly treated as negative pairs. Thanks to such a global strategy, CRL endows our method with better performance because CRL will not overuse the FNPs while ignoring the true-negative pairs. To the best of our knowledge, the proposed method could be one of the first successful contrastive hashing methods. To demonstrate the effectiveness of the proposed method, we carry out experiments on five widely-used datasets compared with 13 state-of-the-art methods. The code is available at https://github.com/penghu-cs/UCCH.
<img src=paper/UCCH.jpg class='center' >
To train a model with 128 bits on MIRFLICKR-25K, just run UCCH.py:
# Features
python UCCH.py --data_name mirflickr25k_fea --bit 128 --alpha 0.7 --num_hiden_layers 3 2 --margin 0.2 --max_epochs 20 --train_batch_size 256 --shift 0.1 --lr 0.0001 --optimizer Adam
# Raw data
python UCCH.py --data_name mirflickr25k --bit 128 --alpha 0.7 --num_hiden_layers 3 2 --margin 0.2 --max_epochs 20 --train_batch_size 256 --shift 0.1 --lr 0.0001 --optimizer Adam --warmup_epoch 5 --pretrain -a vgg11
You can get outputs as follows:
Epoch: 13 / 20
[================= 70/70 ====================>] Step: 28ms | Tot: 2s18ms | Loss: 13.205 | LR: 0.0001
Evaluation: Img2Txt: 0.75797 Txt2Img: 0.759172 Avg: 0.758571
Epoch: 14 / 20
[================= 70/70 ====================>] Step: 28ms | Tot: 1s951ms | Loss: 13.193 | LR: 0.0001
Evaluation: Img2Txt: 0.759404 Txt2Img: 0.759482 Avg: 0.759443
Epoch: 15 / 20
[================= 70/70 ====================>] Step: 28ms | Tot: 1s965ms | Loss: 13.180 | LR: 0.0001
Evaluation: Img2Txt: 0.758604 Txt2Img: 0.75909 Avg: 0.758847
Epoch: 16 / 20
[================= 70/70 ====================>] Step: 28ms | Tot: 1s973ms | Loss: 13.170 | LR: 0.0001
Evaluation: Img2Txt: 0.758019 Txt2Img: 0.757934 Avg: 0.757976
Epoch: 17 / 20
[================= 70/70 ====================>] Step: 28ms | Tot: 1s973ms | Loss: 13.160 | LR: 0.0001
Evaluation: Img2Txt: 0.757612 Txt2Img: 0.758054 Avg: 0.757833
Epoch: 18 / 20
[================= 70/70 ====================>] Step: 29ms | Tot: 1s968ms | Loss: 13.151 | LR: 0.0001
Evaluation: Img2Txt: 0.757199 Txt2Img: 0.757834 Avg: 0.757517
Epoch: 19 / 20
[================= 70/70 ====================>] Step: 30ms | Tot: 2s43ms | Loss: 13.144 | LR: 0.0001
Evaluation: Img2Txt: 0.757373 Txt2Img: 0.757289 Avg: 0.757331
Test: Img2Txt: 0.769567 Txt2Img: 0.746658 Avg: 0.758112
Method | MIRFLICKR-25K | IAPR TC-12 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Image β Text | Text β Image | Image β Text | Text β Image | |||||||||||||
16 | 32 | 64 | 128 | 16 | 32 | 64 | 128 | 16 | 32 | 64 | 128 | 16 | 32 | 64 | 128 | |
CVH[20] | 0.620 | 0.608 | 0.594 | 0.583 | 0.629 | 0.615 | 0.599 | 0.587 | 0.392 | 0.378 | 0.366 | 0.353 | 0.398 | 0.384 | 0.372 | 0.360 |
LSSH[59] | 0.597 | 0.609 | 0.606 | 0.605 | 0.602 | 0.598 | 0.598 | 0.597 | 0.372 | 0.386 | 0.396 | 0.404 | 0.367 | 0.380 | 0.392 | 0.401 |
CMFH[60] | 0.557 | 0.557 | 0.556 | 0.557 | 0.553 | 0.553 | 0.553 | 0.553 | 0.312 | 0.314 | 0.314 | 0.315 | 0.306 | 0.306 | 0.306 | 0.306 |
FSH[18] | 0.581 | 0.612 | 0.635 | 0.662 | 0.576 | 0.607 | 0.635 | 0.660 | 0.377 | 0.392 | 0.417 | 0.445 | 0.383 | 0.399 | 0.425 | 0.451 |
DLFH[23] | 0.638 | 0.658 | 0.677 | 0.684 | 0.675 | 0.700 | 0.718 | 0.725 | 0.342 | 0.358 | 0.374 | 0.395 | 0.358 | 0.380 | 0.403 | 0.434 |
MTFH[16] | 0.507 | 0.512 | 0.558 | 0.554 | 0.514 | 0.524 | 0.518 | 0.581 | 0.277 | 0.324 | 0.303 | 0.311 | 0.294 | 0.337 | 0.269 | 0.297 |
FOMH[58] | 0.575 | 0.640 | 0.691 | 0.659 | 0.585 | 0.648 | 0.719 | 0.688 | 0.312 | 0.316 | 0.317 | 0.350 | 0.311 | 0.315 | 0.322 | 0.373 |
DCH[34] | 0.596 | 0.602 | 0.626 | 0.636 | 0.612 | 0.623 | 0.653 | 0.665 | 0.336 | 0.336 | 0.344 | 0.352 | 0.350 | 0.358 | 0.374 | 0.391 |
UGACH[61] | 0.685 | 0.693 | 0.704 | 0.702 | 0.673 | 0.676 | 0.686 | 0.690 | 0.462 | 0.467 | 0.469 | 0.480 | 0.447 | 0.463 | 0.468 | 0.463 |
DJSRH[62] | 0.652 | 0.697 | 0.700 | 0.716 | 0.662 | 0.691 | 0.683 | 0.695 | 0.409 | 0.412 | 0.470 | 0.480 | 0.418 | 0.436 | 0.467 | 0.478 |
JDSH[63] | 0.724 | 0.734 | 0.741 | 0.745 | 0.710 | 0.720 | 0.733 | 0.720 | 0.449 | 0.472 | 0.478 | 0.484 | 0.447 | 0.477 | 0.473 | 0.486 |
DGCPN[64] | 0.711 | 0.723 | 0.737 | 0.748 | 0.695 | 0.707 | 0.725 | 0.731 | 0.465 | 0.485 | 0.486 | 0.495 | 0.467 | 0.488 | 0.491 | 0.497 |
UCH[13] | 0.654 | 0.669 | 0.679 | / | 0.661 | 0.667 | 0.668 | / | 0.447 | 0.471 | 0.485 | / | 0.446 | 0.469 | 0.488 | / |
UCCH | 0.739 | 0.744 | 0.754 | 0.760 | 0.725 | 0.725 | 0.743 | 0.747 | 0.478 | 0.491 | 0.503 | 0.508 | 0.474 | 0.488 | 0.503 | 0.508 |
Method | NUS-WIDE | MS-COCO | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Image β Text | Text β Image | Image β Text | Text β Image | |||||||||||||
16 | 32 | 64 | 128 | 16 | 32 | 64 | 128 | 16 | 32 | 64 | 128 | 16 | 32 | 64 | 128 | |
CVH[20] | 0.487 | 0.495 | 0.456 | 0.419 | 0.470 | 0.475 | 0.444 | 0.412 | 0.503 | 0.504 | 0.471 | 0.425 | 0.506 | 0.508 | 0.476 | 0.429 |
LSSH[59] | 0.442 | 0.457 | 0.450 | 0.451 | 0.473 | 0.482 | 0.471 | 0.457 | 0.484 | 0.525 | 0.542 | 0.551 | 0.490 | 0.522 | 0.547 | 0.560 |
CMFH[60] | 0.339 | 0.338 | 0.343 | 0.339 | 0.306 | 0.306 | 0.306 | 0.306 | 0.366 | 0.369 | 0.370 | 0.365 | 0.346 | 0.346 | 0.346 | 0.346 |
FSH[18] | 0.557 | 0.565 | 0.598 | 0.635 | 0.569 | 0.604 | 0.651 | 0.666 | 0.539 | 0.549 | 0.576 | 0.587 | 0.537 | 0.524 | 0.564 | 0.573 |
DLFH[23] | 0.385 | 0.399 | 0.443 | 0.445 | 0.421 | 0.421 | 0.462 | 0.474 | 0.522 | 0.580 | 0.614 | 0.631 | 0.444 | 0.489 | 0.513 | 0.534 |
MTFH[16] | 0.297 | 0.297 | 0.272 | 0.328 | 0.353 | 0.314 | 0.399 | 0.410 | 0.399 | 0.293 | 0.295 | 0.395 | 0.335 | 0.374 | 0.300 | 0.334 |
FOMH[58] | 0.305 | 0.305 | 0.306 | 0.314 | 0.302 | 0.304 | 0.300 | 0.306 | 0.378 | 0.514 | 0.571 | 0.601 | 0.368 | 0.484 | 0.559 | 0.595 |
DCH[34] | 0.392 | 0.422 | 0.430 | 0.436 | 0.379 | 0.432 | 0.444 | 0.459 | 0.422 | 0.420 | 0.446 | 0.468 | 0.421 | 0.428 | 0.454 | 0.471 |
UGACH[61] | 0.613 | 0.623 | 0.628 | 0.631 | 0.603 | 0.614 | 0.640 | 0.641 | 0.553 | 0.599 | 0.598 | 0.615 | 0.581 | 0.605 | 0.629 | 0.635 |
DJSRH[62] | 0.502 | 0.538 | 0.527 | 0.556 | 0.465 | 0.532 | 0.538 | 0.545 | 0.501 | 0.563 | 0.595 | 0.615 | 0.494 | 0.569 | 0.604 | 0.622 |
JDSH[63] | 0.647 | 0.656 | 0.679 | 0.680 | 0.649 | 0.669 | 0.689 | 0.699 | 0.579 | 0.628 | 0.647 | 0.662 | 0.578 | 0.634 | 0.659 | 0.672 |
DGCPN[64] | 0.610 | 0.614 | 0.635 | 0.641 | 0.617 | 0.621 | 0.642 | 0.647 | 0.552 | 0.590 | 0.602 | 0.596 | 0.564 | 0.590 | 0.597 | 0.597 |
UCH[13] | / | / | / | / | / | / | / | / | 0.521 | 0.534 | 0.547 | / | 0.499 | 0.519 | 0.545 | / |
UCCH | 0.698 | 0.708 | 0.737 | 0.742 | 0.701 | 0.724 | 0.745 | 0.750 | 0.605 | 0.645 | 0.655 | 0.665 | 0.610 | 0.655 | 0.666 | 0.677 |
<table class="tg", align="center">
If you find UCCH useful in your research, please consider citing:
@article{hu2022UCCH,
title={Unsupervised Contrastive Cross-modal Hashing},
author={Peng Hu, Hongyuan Zhu, Jie Lin, Dezhong Peng, Yin-Ping Zhao, Xi Peng},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2023},
volume={45},
number={3},
pages={3877-3889},
doi={10.1109/TPAMI.2022.3177356}
}