lucidrains / h-transformer-1d

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
MIT License
154 stars 21 forks source link

Mask not working #15

Closed wwx13 closed 3 years ago

wwx13 commented 3 years ago
def forward(self, x, mask = None):
    b, n, device = *x.shape, x.device
    assert n <= self.max_seq_len, 'sequence length must be less than the maximum sequence length'
    x = self.token_emb(x)
    x = self.layers(x)
    return self.to_logits(x)

I think... Masking does not work ???

wwx13 commented 3 years ago

Hey, I wonder if H-transformer support using input mask ? Hope your reply ……^ ^

lucidrains commented 3 years ago

@wwx13 oops! thanks for catching this!