Closed tigerneil closed 7 years ago
I install pytorch from conda, can not reproduce your problem.
(pytorch) ➜ reinforcement_learning git:(master) python actor_critic.py
[2017-07-09 13:09:50,068] Making new env: CartPole-v0
Episode 10 Last length: 132 Average length: 14.91
Episode 20 Last length: 21 Average length: 15.53
Episode 30 Last length: 50 Average length: 16.97
Episode 40 Last length: 199 Average length: 21.94
Episode 50 Last length: 26 Average length: 25.18
Episode 60 Last length: 35 Average length: 26.79
Episode 70 Last length: 43 Average length: 28.10
I pull some new code and It seems OK now. :) @lynic
Test rl codes. But failed in actor critic. Any comments? I have the latest version pytorch installed.
pip install http://download.pytorch.org/whl/torch-0.1.12.post2-cp35-cp35m-macosx_10_7_x86_64.whl
(pytorch3) ➜ reinforcement_learning git:(master) python reinforce.py [2017-05-08 22:53:19,926] Making new env: CartPole-v0 Episode 10 Last length: 13 Average length: 10.64 Episode 20 Last length: 24 Average length: 11.37 Episode 30 Last length: 115 Average length: 15.63 Episode 40 Last length: 17 Average length: 19.16 Episode 50 Last length: 77 Average length: 22.33 Episode 60 Last length: 52 Average length: 24.56 Episode 70 Last length: 67 Average length: 28.63 Episode 80 Last length: 127 Average length: 42.35 Episode 90 Last length: 154 Average length: 52.03 Episode 100 Last length: 1400 Average length: 120.72 Solved! Running reward is now 291.6995857731951 and the last episode runs to 9439 time steps!
(pytorch3) ➜ reinforcement_learning git:(master) python actor_critic.py
[2017-05-08 22:50:58,181] Making new env: CartPole-v0 Traceback (most recent call last): File "actor_critic.py", line 98, in
finish_episode()
File "actor_critic.py", line 74, in finish_episode
action.reinforce(r - value.data.squeeze())
File "/Users/Tiger/anaconda/envs/pytorch3/lib/python3.5/site-packages/torch/autograd/variable.py", line 200, in reinforce
self.creator._reinforce(reward)
File "/Users/Tiger/anaconda/envs/pytorch3/lib/python3.5/site-packages/torch/autograd/stochastic_function.py", line 41, in _reinforce
'x'.join(map(str, self.reward_info[1]))))
ValueError: got reward of size 1, but expected a tensor of size 1x1