u784799i / biLSTM_attn

167 stars 54 forks source link

我认为的一些bug #3

Open greydog2020 opened 4 years ago

greydog2020 commented 4 years ago

在model.py中 1、不能直接reshape( [-1, self.sequence_length]),应该reshape([self.sequence_length,-1])再用permute交换维度,或者一开始就交换维度,之所以这个bug没有表现出来,请参考2 def attention_net(self, lstm_output):

print(lstm_output.size()) = (squence_length, batch_size, hidden_size*layer_size)

    output_reshape = torch.Tensor.reshape(lstm_output, [-1, self.hidden_size*self.layer_size])
    #print(output_reshape.size()) = (squence_length * batch_size, hidden_size*layer_size)

    attn_tanh = torch.tanh(torch.mm(output_reshape, self.w_omega))
    #print(attn_tanh.size()) = (squence_length * batch_size, attention_size)

    attn_hidden_layer = torch.mm(attn_tanh, torch.Tensor.reshape(self.u_omega, [-1, 1]))
    #print(attn_hidden_layer.size()) = (squence_length * batch_size, 1)

    exps = torch.Tensor.reshape(torch.exp(attn_hidden_layer), [-1, self.sequence_length]) #这里不能直接reshape( [-1, self.sequence_length]),应该reshape([self.sequence_length,-1])再用permute交换维度,或者一开始就交换维度,下面要做相应修改,有类似错误
    #print(exps.size()) = (batch_size, squence_length)

    alphas = exps / torch.Tensor.reshape(torch.sum(exps, 1), [-1, 1])
    #print(alphas.size()) = (batch_size, squence_length)

    alphas_reshape = torch.Tensor.reshape(alphas, [-1, self.sequence_length, 1])
    #print(alphas_reshape.size()) = (batch_size, squence_length, 1)

    state = lstm_output.permute(1, 0, 2)
    #print(state.size()) = (batch_size, squence_length, hidden_size*layer_size)

    attn_output = torch.sum(state * alphas_reshape, 1) #alphas_reshape的值始终为0.0625,
    #print(attn_output.size()) = (batch_size, hidden_size*layer_size)

    return attn_output

2、你的u_omega,w_omega类型有问题,应该是Parameter,并且不应该初始化为0。你现在这样做的结果就是u_omega,w_omega始终为0,优化器不会更新它,alphas_reshape的值始终为0.0625,

yysirs commented 3 years ago

这里使用的是torch.autogard中的variable,不是nn.variable。应该可以计算梯度吧 。只用把torch.zeros改为torch.randn就行。