Closed shkids closed 7 months ago
Ideally that shouldn't be the case since all the three operations are computed in parallel and independently. Can you provide a reproducible experiment where we can observe that? Further are you sure, the seeds were the same between the two runs and there is no randomness?
I'm honored for your response. The code below is the code I modified. When tested on existing code and custom data, the modified code showed a slight improvement. I wonder if there is a cause that I don't know about.
The experimental model was the yolov8m model, and the code was combined with bottleneck, and the same combination was used in all experiments.
class ZPool(nn.Module):
def forward(self, x):
return torch.cat(
(torch.max(x, 1)[0].unsqueeze(1), torch.mean(x, 1).unsqueeze(1)), dim=1
)
class AttentionGate(nn.Module):
def __init__(self):
super(AttentionGate, self).__init__()
kernel_size = 7
self.compress = ZPool()
self.conv = Conv(
2, 1, k=kernel_size, s=1, p=(kernel_size - 1) // 2, act=False
)
def forward(self, x):
x_compress = self.compress(x)
x_out = self.conv(x_compress)
scale = torch.sigmoid(x_out)
return x * scale
class TripletAttention(nn.Module):
def __init__(self, no_spatial=False):
super(TripletAttention, self).__init__()
self.cw = AttentionGate()
self.hc = AttentionGate()
self.hw = AttentionGate()
def forward(self, x):
x_hw = self.hw(x)
x_hc = self.hc(x.permute(0, 3, 2, 1).contiguous()).permute(0, 3, 2, 1).contiguous()
x_cw = self.cw(x.permute(0, 2, 1, 3).contiguous()).permute(0, 2, 1, 3).contiguous()
x_out = 1/3 * (x_hw + x_hc + x_cw)
return x_out
@byunsunghun Sorry for my late response, from your snippet I don't see anything obvious that would explain such improvement as you mentioned. However, as I stated if the seeds were not fixed between the two experiments or any other source of randomness can cause variance in performance. It would be best to run multi-seed runs and benchmark the average and variance in the performance of the two settings.
Thank you for your reply. The seeds in both experiments are always fixed, and we will separately investigate whether differences in the different randomness source codes cause performance differences.
happy new year!
Keep me posted, happy new year to you too!
Hi! Sorry for my late response, now responding after 7 months. The seeds in both experiments were always fixed, but as you mentioned earlier, other sources of randomness resulted in performance differences. The question has been resolved. thank you
When the operation order was changed from cw > hc > hw to hw > cw > hc, performance improved in certain models. Calculations are performed independently of each other. Do you know why this is?
thank you.