Closed heonly closed 2 years ago
I noticed pad with window in the encoder section,
def pad_with_window_size(self, batch): size = batch.size() n = len(size) if n == 2: length, batch_size = size padded_length = length + (2 * self.window_size + 1) padded = torch.empty((padded_length, batch_size), dtype=torch.long, device=self.device) padded[:self.window_size, :] = self.pad padded[self.window_size:self.window_size + length, :] = batch padded[-(self.window_size + 1):, :] = self.pad elif n == 3: length, batch_size, hidden = size padded_length = length + (2 * self.window_size + 1) padded = torch.empty((padded_length, batch_size, hidden), dtype=torch.long, device=self.device) padded[:self.window_size, :, :] = self.pad padded[self.window_size:self.window_size + length, :, :] = batch padded[-(self.window_size + 1):, :, :] = self.pad else: raise Exception(f'Cannot pad batch with {n} dimensions.') return padded
When calculating the attention, pad with window on the output of the encoder, that is when n==3, can this achieve the same effect? But this will seriously hurt my results when I try to do this.
I don't think I understand your question. What do you mean by "can this achieve the same effect"? Same effect when compared to what?
I noticed pad with window in the encoder section,
When calculating the attention, pad with window on the output of the encoder, that is when n==3, can this achieve the same effect? But this will seriously hurt my results when I try to do this.