Hello, i am newbie to rdl.
There is my question. Please help me.
The function is at examples/reinforcement_learning/tutorialAC.py
def learn(self, state, reward, state):
v = self.model(np.array([state]))
with tf.GradientTape() as tape:
v = self.model(np.array([state]))
TD_error = r + lambda * V(newS) - V(S)
td_error = reward + LAM * v_ - v
loss = tf.square(td_error)
grad = tape.gradient(loss, self.model.trainable_weights)
self.optimizer.apply_gradients(zip(grad, self.model.trainable_weights))
return td_error
My question is why the 'v' is placed above 'with tf.GradientTape() as tape',
when I place 'v = self.model(np.array([state_]))' in the with context, the model is not convergent.
Hello, i am newbie to rdl. There is my question. Please help me.
The function is at examples/reinforcement_learning/tutorialAC.py def learn(self, state, reward, state): v = self.model(np.array([state])) with tf.GradientTape() as tape: v = self.model(np.array([state]))
TD_error = r + lambda * V(newS) - V(S)
My question is why the 'v' is placed above 'with tf.GradientTape() as tape', when I place 'v = self.model(np.array([state_]))' in the with context, the model is not convergent.
Who can help me? Thanks. :)