Open dylanthomas opened 7 years ago
torch.nn.module must take torch.Variable. But, policy(which is subclass of torch.nn.module) takes numpy.ndarray, so we have to convert numpy.ndarray to torch.Variable. I fixed this problem. See my commit 9e9fb687786a025061561c7260ba9b586e9ca4ce.
Many thanks.
On another note, when I ran Breakout-v0, the reward that I got after 10M steps was 30~40M. But shouldn't this be around 400 according to the DeepMind's paper ? I wonder where the difference is coming from... Any thoughts/ insight on this ?
There are some differences between my code and DeepMind's paper. My code is
That's why the result was not good enough, I think.
Thank you for your reply. Two points --
1. On the param setting, are you aware of this wiki ( https://github.com/muupan/async-rl/wiki ) ?
2. On the performance issue of tensorflow implementation, have you seen this discussion ( https://github.com/dennybritz/reinforcement-learning/issues/30 It's on dqn, but the same issues are supposed to be the root cause on the A3C side as well )
Here cgel suggests the following are the key :
Important stuff:
Normalise input [0,1] Clip rewards [0,1] don't tf.reduce_mean the losses in the batch. Use tf.reduce_max initialise properly the network with xavier init use the optimizer that the paper uses. It is not same RMSProp as in tf
Has your code incorporated all the points above ?
@dylanthomas did you try running Breakout-v0 for longer than 10M timesteps to see if avg reward eventually got to >400? For example, it took Muupan's A3C https://github.com/muupan/async-rl#a3c-ff 20M timesteps to start getting to >400.
Not yet, but I will run this code for 20M to see if it goes up to 400. @ethancaballero
I am new to pyTorch, just cloned your codes and ran them, but got an error. I hope you to point me to the right direction to fix this issue.
More specifics:
=== File "test_a3c.py", line 71, in
test(policy, args)
File "test_a3c.py", line 25, in test
p, v = policy(o)
...
File "/home/john/anaconda3/envs/th/lib/python3.6/site-packages/torch/nn/functional.py", line 37, in conv2d
return f(input, weight, bias) if bias is not None else f(input, weight)
RuntimeError: expected a Variable argument, but got numpy.ndarray
Could you tell me what could be the issue(s) here ?
Many thanks,
John