Is the operation in SGE-Block equivalent to GroupNorm ?

implus / PytorchInsight

a pytorch lib with state-of-the-art architectures, pretrained models and real-time updated results

860 stars 123 forks source link

Is the operation in SGE-Block equivalent to GroupNorm ? #33

Open mrT23 opened 4 years ago

mrT23 commented 4 years ago

Hi. I have two questions:

Question 1:

        t = t - t.mean(dim=1, keepdim=True)
        std = t.std(dim=1, keepdim=True) + 1e-5
        t = t / std
        t = t.view(b, self.groups, h, w)
        t = t * self.weight + self.bias

it this code equivalent to batchNorm (or GroupNorm) ? if so, shouldn't we use running_mean and running_var to stabilize the statistics and improve convergence ?

Question 2:
xn = xn.sum(dim=1, keepdim=True) what it is logic behind this line ? why are summing along the groups ?

thanks a lot Tal

Haus226 commented 2 months ago

For question 2, I think it is used to reduce the weighted channels in each group to obtain the attention map $a$