lucidrains / local-attention

An implementation of local windowed attention for language modeling
MIT License
383 stars 40 forks source link

Please add option for exact window_size masking #3

Closed usamec closed 3 years ago

usamec commented 3 years ago
        exact_mask = torch.abs(bq_t[:,:,:,None] - bq_k[:,:,None,:]) > self.window_size                                                                                              
        dots.masked_fill_(exact_mask, mask_value)                                                                     

This could be hidden under a flag similarly to shared_qk.

lucidrains commented 3 years ago

@usamec Hi Vlado! this is a great suggestion! building now

lucidrains commented 3 years ago

https://github.com/lucidrains/local-attention/releases/tag/1.2.0