vasgaowei / TS-CAM

Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.
https://openaccess.thecvf.com/content/ICCV2021/papers/Gao_TS-CAM_Token_Semantic_Coupled_Attention_Map_for_Weakly_Supervised_Object_ICCV_2021_paper.pdf
Apache License 2.0
133 stars 25 forks source link

How could I get an performance(imageNet-1K) like your paper? #12

Open SejinPark99 opened 2 years ago

SejinPark99 commented 2 years ago

Hello. At first, thank you for your code and paper. Your paper caught my attention! But.. I have a problem. When training is turned to default setting, the performance is not good like you. So, if possible, can you tell me your setting? Like this : https://github.com/vasgaowei/TS-CAM/issues/7#issuecomment-927473246

I'm hoping for your answer :+1: :+1:

I am looking forward to your great papers. Thank you :)

It is my hyper-parameter setting.

{'BASIC': {'BACKUP_CODES': True, 'BACKUP_LIST': ['lib', 'tools_cam', 'configs'], 'DISP_FREQ': 10, 'GPU_ID': [0], 'NUM_WORKERS': 8, 'ROOT_DIR': './tools_cam/..', 'SAVE_DIR': 'ckpt/ImageNet/deit_tscam_small_patch16_224_CAM-NORMAL_SEED26_CAM-THR0.12_BS256_2022-02-02-14-27', 'SEED': 26, 'TIME': '2022-02-02-14-27'}, 'CUDNN': {'BENCHMARK': False, 'DETERMINISTIC': True, 'ENABLE': True}, 'DATA': {'CROP_SIZE': 224, 'DATADIR': 'data/ImageNet_ILSVRC2012', 'DATASET': 'ImageNet', 'IMAGE_MEAN': [0.485, 0.456, 0.406], 'IMAGE_STD': [0.229, 0.224, 0.225], 'NUM_CLASSES': 1000, 'RESIZE_SIZE': 512, 'SCALE_LENGTH': 15, 'SCALE_SIZE': 196}, 'MODEL': {'ARCH': 'deit_tscam_small_patch16_224', 'CAM_THR': 0.12, 'LOCALIZER_DIR': '', 'TOP_K': 1}, 'SOLVER': {'LR_FACTOR': 0.1, 'LR_STEPS': [10, 12], 'MUMENTUM': 0.9, 'NUM_EPOCHS': 20, 'START_LR': 0.004, 'WEIGHT_DECAY': 0.0005}, 'TEST': {'BATCH_SIZE': 512, 'CKPT_DIR': '', 'SAVE_BOXED_IMAGE': False, 'SAVE_CAMS': False, 'TEN_CROPS': False}, 'TRAIN': {'ALPHA': 1.0, 'BATCH_SIZE': 256, 'BETA': 1.0}} ==> Preparing data... done! ==> Preparing networks for baseline... Removing key head.weight from pretrained checkpoint TSCAM(

And the result is like this :

Val Epoch: [12][98/98] Loss 1.4334 (1.4654)
Cls@1:0.657 Cls@5:0.858 Loc@1:0.451 Loc@5:0.564 Loc_gt:0.609

vasgaowei commented 2 years ago

Hello, thanks for your attention. Which kind of transformer do you use as backbone, deit-tiny, deit-small or deit-large? If you choose deit-small, and the default setting, e.g, batch_size, learning-rate, MODEL.CAM_THR, the performance should not differ a lot.
Or you can download the pretrained model we provided and test the performance.