lucidrains / h-transformer-1d

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
MIT License
155 stars 21 forks source link

Add Norm Missing #16

Closed wwx13 closed 3 years ago

wwx13 commented 3 years ago

I am using code now, and i wonder is there implemented add norm? I only find layer norm, but no add operation. Here is code in h-transformer-1d.py line 489 ... Is this a bug or something ? Thanks @Lucidrains

for ind in range(depth): attn = attn_class(dim, dim_head = dim_head, heads = heads, block_size = block_size, pos_emb = self.pos_emb, **attn_kwargs) ff = FeedForward(dim, mult = ff_mult)

        if shift_tokens:
            attn, ff = map(lambda t: PreShiftTokens(shift_token_ranges, t), (attn, ff))

        attn, ff = map(lambda t: PreNorm(dim, t), (attn, ff))
        layers.append(nn.ModuleList([attn ,ff]))_
wwx13 commented 3 years ago

@lucidrains

wwx13 commented 3 years ago

execute_type = ReversibleSequence if reversible else SequentialSequence

ops, i find add norm! Closed.