landskape-ai / triplet-attention

Official PyTorch Implementation for "Rotate to Attend: Convolutional Triplet Attention Module." [WACV 2021]
https://openaccess.thecvf.com/content/WACV2021/html/Misra_Rotate_to_Attend_Convolutional_Triplet_Attention_Module_WACV_2021_paper.html
MIT License
406 stars 49 forks source link

I Have a question #25

Closed shkids closed 7 months ago

shkids commented 10 months ago

When the operation order was changed from cw > hc > hw to hw > cw > hc, performance improved in certain models. Calculations are performed independently of each other. Do you know why this is?

thank you.

digantamisra98 commented 10 months ago

Ideally that shouldn't be the case since all the three operations are computed in parallel and independently. Can you provide a reproducible experiment where we can observe that? Further are you sure, the seeds were the same between the two runs and there is no randomness?

shkids commented 10 months ago

I'm honored for your response. The code below is the code I modified. When tested on existing code and custom data, the modified code showed a slight improvement. I wonder if there is a cause that I don't know about.

The experimental model was the yolov8m model, and the code was combined with bottleneck, and the same combination was used in all experiments.

class ZPool(nn.Module):
    def forward(self, x):
        return torch.cat(
            (torch.max(x, 1)[0].unsqueeze(1), torch.mean(x, 1).unsqueeze(1)), dim=1
        )
class AttentionGate(nn.Module):
    def __init__(self):
        super(AttentionGate, self).__init__()
        kernel_size = 7
        self.compress = ZPool()
        self.conv = Conv(
            2, 1, k=kernel_size, s=1, p=(kernel_size - 1) // 2, act=False
        )
    def forward(self, x):
        x_compress = self.compress(x)
        x_out = self.conv(x_compress)
        scale = torch.sigmoid(x_out)
        return x * scale
class TripletAttention(nn.Module):
    def __init__(self, no_spatial=False):
        super(TripletAttention, self).__init__()
        self.cw = AttentionGate()
        self.hc = AttentionGate()
        self.hw = AttentionGate()
    def forward(self, x):
        x_hw = self.hw(x)  
        x_hc = self.hc(x.permute(0, 3, 2, 1).contiguous()).permute(0, 3, 2, 1).contiguous()  
        x_cw = self.cw(x.permute(0, 2, 1, 3).contiguous()).permute(0, 2, 1, 3).contiguous() 
        x_out = 1/3 * (x_hw + x_hc + x_cw)
        return x_out
digantamisra98 commented 10 months ago

@byunsunghun Sorry for my late response, from your snippet I don't see anything obvious that would explain such improvement as you mentioned. However, as I stated if the seeds were not fixed between the two experiments or any other source of randomness can cause variance in performance. It would be best to run multi-seed runs and benchmark the average and variance in the performance of the two settings.

shkids commented 9 months ago

Thank you for your reply. The seeds in both experiments are always fixed, and we will separately investigate whether differences in the different randomness source codes cause performance differences.

happy new year!

digantamisra98 commented 9 months ago

Keep me posted, happy new year to you too!

shkids commented 2 months ago

Hi! Sorry for my late response, now responding after 7 months. The seeds in both experiments were always fixed, but as you mentioned earlier, other sources of randomness resulted in performance differences. The question has been resolved. thank you