The code may not be consistent with the formula

kafmws commented 1 year ago

In the paper, the description of the encoder in Chapter 3.1. IPT architecture is:

$y0 = [ E{p1} + f{p1} , E{p2} + f{p2} , \dots, E{pN} + f_{pN} ],$ $q_i = k_i = vi = LN(y{i-1}),$ $y_i^{\prime} = MSA(q_i, k_i, vi) + y{i-1},$ $\cdots$

however, in the class TransformerEncoderLayer of the python file model/ipt.py, the code writes:

class TransformerEncoderLayer(nn.Module):

    def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1, no_norm = False,
                 activation="relu"):
        ...

    def with_pos_embed(self, tensor, pos):
        return tensor if pos is None else tensor + pos

    def forward(self, src, pos = None):
        src2 = self.norm1(src)                                         # here
        q = k = self.with_pos_embed(src2, pos)             # here
        src2 = self.self_attn(q, k, src2)                            # here
        src = src + self.dropout1(src2[0])
        src2 = self.norm2(src)
        src2 = self.linear2(self.dropout(self.activation(self.linear1(src2))))
        src = src + self.dropout2(src2)
        return src

It's more likely the formula should be: $vi=LN([ f{p1} , f{p2} , \dots, f{pN} ]),$ $y0 = [ E{p1} + f{p1} , E{p2} + f{p2} , \dots, E{pN} + f_{pN} ],$ $q_i = ki = LN(y{i-1}),$

, OR may they are equivalent ? I'm confused. Thanks.

HantingChen commented 1 year ago

Sorry for the misleading. The two formulas are inequivalent and the second formula is right.

kafmws commented 1 year ago

Thanks for your reply.

huawei-noah / Pretrained-IPT

The code may not be consistent with the formula #49