gugundi / NeuralMachineTranslation

Neural Machine Tranlation using Local Attention
6 stars 3 forks source link

About code #1

Open N-Kingsley opened 5 years ago

N-Kingsley commented 5 years ago

Hi, I want to ask some about your code. In 176-188 lines in model.py, why to do this: if start < self.window_size: d = self.window_size - start score[i, j, :d] = epsilon if end > li + self.window_size: d = (li + self.window_size) - end score[i, j, d:] = epsilon

Shouldn’t it judge whether the selected window is beyond length?

ChrisFugl commented 5 years ago

Hi @N-Kingsley. You are right, that this part of the code should determine if the selected window is beyond the input sentence length - and that is what it does. The reason that we enforce the window to be within the interval [window_size, li + window_size] (rather than [0, li]) is that we have padded the input sentence on the left and on the right as a performance optimisation. (Doing so allows us to do local attention in batches.) We do not want the model to pay attention to the padded sides of the input sentence and so we set those attention scores to "epsilon" before applying a softmax on the scores.