hanbt / learn_dl

Deep learning algorithms source code for beginners
Apache License 2.0
1.19k stars 988 forks source link

循环神经网络rnn.py中calc_delta_k方法实现有错误 #29

Open cyrixlin opened 5 years ago

cyrixlin commented 5 years ago

首先,非常感谢《零基础入门深度学习》作者hanbingtao付出的辛苦努力,提供了这么好的教程和代码程序。

在学习的过程中我发现,零基础入门深度学习(5) - 循环神经网络的代码实现rnn.py有个明显错误的地方,并且我用梯度检验程序验过了,确实有问题。现提出问题和解决方法如下,供作者参考。

原方法内容:

def calc_delta_k(self, k, activator): ''' 根据k+1时刻的delta计算k时刻的delta ''' state = self.state_list[k+1].copy() element_wise_op(self.state_list[k+1], activator.backward) self.delta_list[k] = np.dot( np.dot(self.delta_list[k+1].T, self.W), np.diag(state[:,0])).T

这里存在2处明显的错误:

  1. state 应取self.state_list[k].copy(),而非k+1元素。
  2. state变量取出后,没进行element_wise_op操作,应当放在element_wise_op方法中进行逐元素的activator.backward操作。

分析: state取self.state_list[k].copy()后,再进行element_wise_op操作,获得激活函数的导数数组,用此k层的数组乘以(k+1层的误差项与W的乘积)才是k层的误差项。

修改如下: def calc_delta_k(self, k, activator): ''' 根据k+1时刻的delta计算k时刻的delta ''' state = self.state_list[k].copy() element_wise_op(state, activator.backward) self.delta_list[k] = np.dot( np.dot(self.delta_list[k+1].T, self.W), np.diag(state[:,0])).T

验证情况如下: 验证数据调整如下(输入数据调整为4维,输入数据调整为3个): def data_set(): x = [np.array([[1], [2], [3], [8]]), np.array([[2], [3], [4],[-9]]), np.array([[-1], [-2], [4], [3]])] d = np.array([[1], [2]]) return x, d

验证程序调整如下(输入数据调整为4维,每层的隐藏神经元数调整为3,输入数据调整为3个): def gradient_check(): ''' 梯度检查 '''

设计一个误差函数,取所有节点输出项之和

error_function = lambda o: o.sum()

rl = RecurrentLayer(4, 3, IdentityActivator(), 1e-3)

# 计算forward值
x, d = data_set()
rl.forward(x[0])
rl.forward(x[1])
rl.forward(x[2])

# 求取sensitivity map
sensitivity_array = np.ones(rl.state_list[-1].shape,
                            dtype=np.float64)
# 计算梯度
rl.backward(sensitivity_array, IdentityActivator())

# 检查梯度
epsilon = 10e-4
for i in range(rl.W.shape[0]):
    for j in range(rl.W.shape[1]):
        rl.W[i,j] += epsilon
        rl.reset_state()
        rl.forward(x[0])
        rl.forward(x[1])
        rl.forward(x[2])
        err1 = error_function(rl.state_list[-1])
        rl.W[i,j] -= 2*epsilon
        rl.reset_state()
        rl.forward(x[0])
        rl.forward(x[1])
        rl.forward(x[2])
        err2 = error_function(rl.state_list[-1])
        expect_grad = (err1 - err2) / (2 * epsilon)
        rl.W[i,j] += epsilon
        print 'weights(%d,%d): expected - actural %f - %f' % (
            i, j, expect_grad, rl.gradient[i,j])

按calc_delta_k的原程序,输出如下: D:\python_2.7\python.exe D:/python_code/learn_dl-master/rnn.py weights(0,0): expected - actural 0.000095 - 1.000000 weights(0,1): expected - actural 0.000372 - 1.000000 weights(0,2): expected - actural 0.000512 - 1.000000 weights(1,0): expected - actural 0.000095 - 1.000000 weights(1,1): expected - actural 0.000372 - 1.000000 weights(1,2): expected - actural 0.000512 - 1.000000 weights(2,0): expected - actural 0.000095 - 1.000000 weights(2,1): expected - actural 0.000372 - 1.000000 weights(2,2): expected - actural 0.000512 - 1.000000

Process finished with exit code 0

按calc_delta_k的修改后的程序,输出如下: D:\python_2.7\python.exe D:/python_code/learn_dl-master/rnn.py weights(0,0): expected - actural -0.001360 - -0.001360 weights(0,1): expected - actural 0.000520 - 0.000520 weights(0,2): expected - actural 0.000452 - 0.000452 weights(1,0): expected - actural -0.001360 - -0.001360 weights(1,1): expected - actural 0.000520 - 0.000520 weights(1,2): expected - actural 0.000452 - 0.000452 weights(2,0): expected - actural -0.001360 - -0.001360 weights(2,1): expected - actural 0.000520 - 0.000520 weights(2,2): expected - actural 0.000452 - 0.000452

Process finished with exit code 0

由此可以验证原程序calc_delta_k函数是不正确的,修改后的是正确的。

GSD-Dreammark commented 5 years ago

这里确实有问题,和文中的公式3明显对不上。问个问题在bp.py文件中的梯度检查方法gradient_check 中计算网络误差 network_error=lambda vec1,vec2:0.5reduce(lambda a,b:a+b,map(lambda v:(v[0]-v[1])(v[0]-v[1]),zip(vec1,vec2)))为啥是这个公式?

JnuSimba commented 1 year ago

@cyrixlin 求问这里是不是对应公式4,若是的话 self.W 是否应该改成 self.U ?