oreilly-japan / deep-learning-from-scratch

『ゼロから作る Deep Learning』(O'Reilly Japan, 2016)
MIT License
3.99k stars 3.34k forks source link

I'm confused with the numerical_gradient function used in TwoLayerNet class in Chapter4 #61

Closed Dongsheng600 closed 3 years ago

Dongsheng600 commented 3 years ago

Here is the part of codes about my question:

In ch04/two_layer_net.py

def numerical_gradient(self, x, t):
    loss_W = lambda W: self.loss(x, t)

    grads = {}
    grads['W1'] = numerical_gradient(loss_W, self.params['W1'])
    grads['b1'] = numerical_gradient(loss_W, self.params['b1'])
    grads['W2'] = numerical_gradient(loss_W, self.params['W2'])
    grads['b2'] = numerical_gradient(loss_W, self.params['b2'])

    return grads

In common/gradient.py

def numerical_gradient(f, x):
    h = 1e-4 # 0.0001
    grad = np.zeros_like(x)

    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        tmp_val = x[idx]
        x[idx] = tmp_val + h
        fxh1 = f(x) # f(x+h)

        x[idx] = tmp_val - h 
        fxh2 = f(x) # f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2*h)

        x[idx] = tmp_val # 値を元に戻す
        it.iternext()   

    return grad

I'm not sure about whether the parameter W defined in the lambda function loss_W is meaningful. How will the f(x) in numerical_gradient() in gradient.py work? I think it will not affect the value of the loss function.

propella commented 1 year ago

While this issue was closed, I'd like to keep a comment for future readers as I was confused exactly at the same place.

The parameter W isn't used in loss_W or it doesn't look meaningfull. However, numerical_gradient() can still use the second parameter to calculate f(x+h) and f(x-h). The trick is the second parameter such as self.param['W1'] is modified "in place" within numerical_gradient() and it changes the self.params['W1'] state. So it eventually affects the value of self.loss(x, t).