landskape-ai / triplet-attention

Official PyTorch Implementation for "Rotate to Attend: Convolutional Triplet Attention Module." [WACV 2021]
https://openaccess.thecvf.com/content/WACV2021/html/Misra_Rotate_to_Attend_Convolutional_Triplet_Attention_Module_WACV_2021_paper.html
MIT License
406 stars 49 forks source link

Pretrain weights and experiment set on resnet20,32 #26

Closed Stwyb closed 7 months ago

Stwyb commented 7 months ago

Thank you for sharing your work, the interaction of information between different dimensions is very interesting and this is very helpful for me.

When I tried to reproduce Triplet attention on resnet20,32, I had a little problem, the accuracy of my baseline model as well as the model after applying Triplet attention were lower than the results mentioned in your article. So can you share your pre-training weights and experimental setup? Thanks a lot!

digantamisra98 commented 7 months ago

Hello. All our pretrained models are available here. Regarding lower performance, can you share the exact config that you used, whats the dataset and how much lower performance you are getting?

Stwyb commented 7 months ago

Hello. Thank you for your prompt reply

  1. The dataset which I used is Cifar-10.
  2. For data augmentation, each image is zero-padded with 4 pixels on each side, and a 32 × 32 patch is randomly cropped from the padded image or its horizontal flip. The input images are then normalised using RGB mean values and standard deviations.
  3. For optimization, we use a synchronous SGD optimizer with weight decay of 5e-4, momentum of 0.9, and mini-batch size of 128, and set the initial learning rate to 0.1 and divide it by 5 at the 60th, 120th, and 160th epochs, to train all models from scratch for 200 epochs.
  4. For the baseline I used the standard resnet20 and resnet32 (with a convolutional kernel size of 3). For models where triplet attention is introduced, I added triplet attention to the back of each BasicBlock (set reduction_ratio=16).
  5. The results of my training are baseline model accuracy: 91.25% (resnet20) and 92.49% (resnet32). After adding triplet attention, accuracy: 91.68%(resnet20) and 92.72%(resnet32). There is still a small difference compared to your results in the paper (Table 5 for 93.12% (resnet32) and Supplementary Experiments Table 1 for 92.66% (resnet20))

I guess the problem could be with the reduction_ratio, because of the difference in the number of channels between resnet20 and ResNet18.

Thanks again for your help!

digantamisra98 commented 7 months ago

Yes, I believe correct tuning of reduction_ratio is required. Please feel free to reopen this issue if it is still not resolved.