Open greydog2020 opened 4 years ago
在model.py中 1、不能直接reshape( [-1, self.sequence_length]),应该reshape([self.sequence_length,-1])再用permute交换维度,或者一开始就交换维度,之所以这个bug没有表现出来,请参考2 def attention_net(self, lstm_output):
output_reshape = torch.Tensor.reshape(lstm_output, [-1, self.hidden_size*self.layer_size]) #print(output_reshape.size()) = (squence_length * batch_size, hidden_size*layer_size) attn_tanh = torch.tanh(torch.mm(output_reshape, self.w_omega)) #print(attn_tanh.size()) = (squence_length * batch_size, attention_size) attn_hidden_layer = torch.mm(attn_tanh, torch.Tensor.reshape(self.u_omega, [-1, 1])) #print(attn_hidden_layer.size()) = (squence_length * batch_size, 1) exps = torch.Tensor.reshape(torch.exp(attn_hidden_layer), [-1, self.sequence_length]) #这里不能直接reshape( [-1, self.sequence_length]),应该reshape([self.sequence_length,-1])再用permute交换维度,或者一开始就交换维度,下面要做相应修改,有类似错误 #print(exps.size()) = (batch_size, squence_length) alphas = exps / torch.Tensor.reshape(torch.sum(exps, 1), [-1, 1]) #print(alphas.size()) = (batch_size, squence_length) alphas_reshape = torch.Tensor.reshape(alphas, [-1, self.sequence_length, 1]) #print(alphas_reshape.size()) = (batch_size, squence_length, 1) state = lstm_output.permute(1, 0, 2) #print(state.size()) = (batch_size, squence_length, hidden_size*layer_size) attn_output = torch.sum(state * alphas_reshape, 1) #alphas_reshape的值始终为0.0625, #print(attn_output.size()) = (batch_size, hidden_size*layer_size) return attn_output
2、你的u_omega,w_omega类型有问题,应该是Parameter,并且不应该初始化为0。你现在这样做的结果就是u_omega,w_omega始终为0,优化器不会更新它,alphas_reshape的值始终为0.0625,
这里使用的是torch.autogard中的variable,不是nn.variable。应该可以计算梯度吧 。只用把torch.zeros改为torch.randn就行。
在model.py中 1、不能直接reshape( [-1, self.sequence_length]),应该reshape([self.sequence_length,-1])再用permute交换维度,或者一开始就交换维度,之所以这个bug没有表现出来,请参考2 def attention_net(self, lstm_output):
print(lstm_output.size()) = (squence_length, batch_size, hidden_size*layer_size)
2、你的u_omega,w_omega类型有问题,应该是Parameter,并且不应该初始化为0。你现在这样做的结果就是u_omega,w_omega始终为0,优化器不会更新它,alphas_reshape的值始终为0.0625,