Closed Gonm1 closed 4 years ago
I think that, since it is DQN, you have a target network for the predictions at every step and you .fit the q_eval model for the learning, also at every step. This is supposed to help with stability.
https://github.com/philtabor/Youtube-Code-Repository/blob/d70e8cfb640b648d115bd32941549182309d8366/ReinforcementLearning/DeepQLearning/dqn_keras.py#L82
The improvement in stability comes from using the target network to evaluate the value of the new states, in the learning function.
I recommend reading the nature paper to clarify this issue.
I think that, since it is DQN, you have a target network for the predictions at every step and you .fit the q_eval model for the learning, also at every step. This is supposed to help with stability.
https://github.com/philtabor/Youtube-Code-Repository/blob/d70e8cfb640b648d115bd32941549182309d8366/ReinforcementLearning/DeepQLearning/dqn_keras.py#L82