Open atlevesque opened 4 years ago
Here are the results I got when running the same Pong test case as you did in the course. it is marginally better than my run with the DQN algorythm and slightly worse than the score you had in your demo as I had to dramattically decrease the ReplayMemory size to fit in my old 6Gb RAM PC😞
Here are the results I got when running the same Pong test case as you did in the course. it is marginally better than my run with the DQN algorythm and slightly worse than the score you had in your demo as I had to dramattically decrease the ReplayMemory size to fit in my old 6Gb RAM PC😞
Hi, Did you make any other changes before running the main_ddqn.py ? When i tried to run it, the agent is not learning and the average scores are around -17.0, can you let me know how were you able to obtain appropriate results ?
Thanks in advance!
When running the ddqn agent on pytorch v 1.5.0 I get the following RuntimeError:
RuntimeError: range.second - range.first == t.size() INTERNAL ASSERT FAILED at ..\torch\csrc\autograd\generated\Functions.cpp:57, please report a bug to PyTorch. inconsistent range for TensorList output (copy_range at ..\torch\csrc\autograd\generated\Functions.cpp:57) (no backtrace available)'
My guess is that there is a diamond shape dependency when running the backward method as the
self.q_eval
network parameters affect the loss viaq_pred
andq_eval
.I fixed the issue by explicitly detaching the
max_actions
tensor from the computational tree as it is a discrete value and small changes in theself.q_eval
network parameters should not change the max_actions taken. The derivative of the loss with respect to theself.q_eval
network parameters thus only comes from the q_pred calculation.I tested this change on my computer and got good performance and (more improtantly) didn't get the RuntimeError.