philtabor / Deep-Q-Learning-Paper-To-Code

MIT License
342 stars 145 forks source link

Update ddqn_agent.py to prevent RuntimeError with newer pytorch version #3

Open atlevesque opened 4 years ago

atlevesque commented 4 years ago

When running the ddqn agent on pytorch v 1.5.0 I get the following RuntimeError:

RuntimeError: range.second - range.first == t.size() INTERNAL ASSERT FAILED at ..\torch\csrc\autograd\generated\Functions.cpp:57, please report a bug to PyTorch. inconsistent range for TensorList output (copy_range at ..\torch\csrc\autograd\generated\Functions.cpp:57) (no backtrace available)'

My guess is that there is a diamond shape dependency when running the backward method as the self.q_eval network parameters affect the loss via q_pred and q_eval.

I fixed the issue by explicitly detaching the max_actions tensor from the computational tree as it is a discrete value and small changes in the self.q_eval network parameters should not change the max_actions taken. The derivative of the loss with respect to the self.q_eval network parameters thus only comes from the q_pred calculation.

I tested this change on my computer and got good performance and (more improtantly) didn't get the RuntimeError.

atlevesque commented 4 years ago

Here are the results I got when running the same Pong test case as you did in the course. it is marginally better than my run with the DQN algorythm and slightly worse than the score you had in your demo as I had to dramattically decrease the ReplayMemory size to fit in my old 6Gb RAM PC😞

DDQNAgent_PongNoFrameskip-v4_lr0 0001_500games

srikanthkb commented 4 years ago

Here are the results I got when running the same Pong test case as you did in the course. it is marginally better than my run with the DQN algorythm and slightly worse than the score you had in your demo as I had to dramattically decrease the ReplayMemory size to fit in my old 6Gb RAM PC😞

DDQNAgent_PongNoFrameskip-v4_lr0 0001_500games

Hi, Did you make any other changes before running the main_ddqn.py ? When i tried to run it, the agent is not learning and the average scores are around -17.0, can you let me know how were you able to obtain appropriate results ?

Thanks in advance!