class Transformer(nn.Module):
def forward(self, x: Tensor, *args, **kwargs) -> Tensor:
for attn, ffn in zip(self.layers, self.ffn_layers):
# print(x.shape)
x, _ = attn(x, x, x, is_causal=True, *args, **kwargs)
x = x + x
x = ffn(x) + x
return x
Is the line x = x + x wrong? This seems not a residual connection.
Upvote & Fund
We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.
In bit_transformer.py:
Is the line
x = x + x
wrong? This seems not a residual connection.Upvote & Fund