gugundi / NeuralMachineTranslation

Neural Machine Tranlation using Local Attention
6 stars 3 forks source link

About pad with window #2

Closed heonly closed 2 years ago

heonly commented 3 years ago

I noticed pad with window in the encoder section,

def pad_with_window_size(self, batch):
    size = batch.size()
    n = len(size)
    if n == 2:
        length, batch_size = size
        padded_length = length + (2 * self.window_size + 1)
        padded = torch.empty((padded_length, batch_size), dtype=torch.long, device=self.device)
        padded[:self.window_size, :] = self.pad
        padded[self.window_size:self.window_size + length, :] = batch
        padded[-(self.window_size + 1):, :] = self.pad
    elif n == 3:
        length, batch_size, hidden = size
        padded_length = length + (2 * self.window_size + 1)
        padded = torch.empty((padded_length, batch_size, hidden), dtype=torch.long, device=self.device)
        padded[:self.window_size, :, :] = self.pad
        padded[self.window_size:self.window_size + length, :, :] = batch
        padded[-(self.window_size + 1):, :, :] = self.pad
    else:
        raise Exception(f'Cannot pad batch with {n} dimensions.')
    return padded

When calculating the attention, pad with window on the output of the encoder, that is when n==3, can this achieve the same effect? But this will seriously hurt my results when I try to do this.

ChrisFugl commented 3 years ago

I don't think I understand your question. What do you mean by "can this achieve the same effect"? Same effect when compared to what?