Closed kafmws closed 1 year ago
In the paper, the description of the encoder in Chapter 3.1. IPT architecture is:
Chapter 3.1. IPT architecture
$y0 = [ E{p1} + f{p1} , E{p2} + f{p2} , \dots, E{pN} + f_{pN} ],$ $q_i = k_i = vi = LN(y{i-1}),$ $y_i^{\prime} = MSA(q_i, k_i, vi) + y{i-1},$ $\cdots$
however, in the class TransformerEncoderLayer of the python file model/ipt.py, the code writes:
TransformerEncoderLayer
model/ipt.py
class TransformerEncoderLayer(nn.Module): def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1, no_norm = False, activation="relu"): ... def with_pos_embed(self, tensor, pos): return tensor if pos is None else tensor + pos def forward(self, src, pos = None): src2 = self.norm1(src) # here q = k = self.with_pos_embed(src2, pos) # here src2 = self.self_attn(q, k, src2) # here src = src + self.dropout1(src2[0]) src2 = self.norm2(src) src2 = self.linear2(self.dropout(self.activation(self.linear1(src2)))) src = src + self.dropout2(src2) return src
It's more likely the formula should be: $vi=LN([ f{p1} , f{p2} , \dots, f{pN} ]),$ $y0 = [ E{p1} + f{p1} , E{p2} + f{p2} , \dots, E{pN} + f_{pN} ],$ $q_i = ki = LN(y{i-1}),$
, OR may they are equivalent ? I'm confused. Thanks.
Sorry for the misleading. The two formulas are inequivalent and the second formula is right.
Thanks for your reply.
In the paper, the description of the encoder in
Chapter 3.1. IPT architecture
is:however, in the class
TransformerEncoderLayer
of the python filemodel/ipt.py
, the code writes:It's more likely the formula should be: $vi=LN([ f{p1} , f{p2} , \dots, f{pN} ]),$ $y0 = [ E{p1} + f{p1} , E{p2} + f{p2} , \dots, E{pN} + f_{pN} ],$ $q_i = ki = LN(y{i-1}),$
, OR may they are equivalent ? I'm confused. Thanks.