tensorlayer / TensorLayer

Deep Learning and Reinforcement Learning Library for Scientists and Engineers
http://tensorlayerx.com
Other
7.31k stars 1.61k forks source link

problem about tutorial_AC . why should 'v_' be placed here #1071

Closed 10ca1h0st closed 4 years ago

10ca1h0st commented 4 years ago

Hello, i am newbie to rdl. There is my question. Please help me.

The function is at examples/reinforcement_learning/tutorialAC.py def learn(self, state, reward, state): v = self.model(np.array([state])) with tf.GradientTape() as tape: v = self.model(np.array([state]))

TD_error = r + lambda * V(newS) - V(S)

        td_error = reward + LAM * v_ - v
        loss = tf.square(td_error)
    grad = tape.gradient(loss, self.model.trainable_weights)
    self.optimizer.apply_gradients(zip(grad, self.model.trainable_weights))
    return td_error

My question is why the 'v' is placed above 'with tf.GradientTape() as tape', when I place 'v = self.model(np.array([state_]))' in the with context, the model is not convergent.

Who can help me? Thanks. :)

quantumiracle commented 4 years ago

The 'v_' is target value, which is not supposed to be optimized. That's also why it's called the target, like the label in supervised learning.