lucidrains / local-attention

An implementation of local windowed attention for language modeling
MIT License
370 stars 40 forks source link

The look_around function seems to be incorrect #18

Closed datvuthanh closed 1 year ago

datvuthanh commented 1 year ago

According to the definition of F.pad (https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html), I see that the padding position should be as follows: (padding_left, padding_right, padding_top, padding_bottom, padding_front, padding_back).

However, the line below seems to have the positions of backward and forward swapped:

padded_x = F.pad(x, (*dims, backward, forward), value = pad_value)

The code line should be:

padded_x = F.pad(x, (*dims, forward, backward), value=pad_value)
lucidrains commented 1 year ago

it is correct, because if you say want to look backwards one bucket, you need one padded bucket on the left