pytorch / examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
https://pytorch.org/examples
BSD 3-Clause "New" or "Revised" License
22.31k stars 9.53k forks source link

actor critic example failed #159

Closed tigerneil closed 7 years ago

tigerneil commented 7 years ago

Test rl codes. But failed in actor critic. Any comments? I have the latest version pytorch installed.

image

pip install http://download.pytorch.org/whl/torch-0.1.12.post2-cp35-cp35m-macosx_10_7_x86_64.whl

(pytorch3) ➜ reinforcement_learning git:(master) python reinforce.py [2017-05-08 22:53:19,926] Making new env: CartPole-v0 Episode 10 Last length: 13 Average length: 10.64 Episode 20 Last length: 24 Average length: 11.37 Episode 30 Last length: 115 Average length: 15.63 Episode 40 Last length: 17 Average length: 19.16 Episode 50 Last length: 77 Average length: 22.33 Episode 60 Last length: 52 Average length: 24.56 Episode 70 Last length: 67 Average length: 28.63 Episode 80 Last length: 127 Average length: 42.35 Episode 90 Last length: 154 Average length: 52.03 Episode 100 Last length: 1400 Average length: 120.72 Solved! Running reward is now 291.6995857731951 and the last episode runs to 9439 time steps!

(pytorch3) ➜ reinforcement_learning git:(master) python actor_critic.py

[2017-05-08 22:50:58,181] Making new env: CartPole-v0 Traceback (most recent call last): File "actor_critic.py", line 98, in finish_episode() File "actor_critic.py", line 74, in finish_episode action.reinforce(r - value.data.squeeze()) File "/Users/Tiger/anaconda/envs/pytorch3/lib/python3.5/site-packages/torch/autograd/variable.py", line 200, in reinforce self.creator._reinforce(reward) File "/Users/Tiger/anaconda/envs/pytorch3/lib/python3.5/site-packages/torch/autograd/stochastic_function.py", line 41, in _reinforce 'x'.join(map(str, self.reward_info[1])))) ValueError: got reward of size 1, but expected a tensor of size 1x1

lynic commented 7 years ago

I install pytorch from conda, can not reproduce your problem.

(pytorch) ➜  reinforcement_learning git:(master) python actor_critic.py
[2017-07-09 13:09:50,068] Making new env: CartPole-v0
Episode 10      Last length:   132      Average length: 14.91
Episode 20      Last length:    21      Average length: 15.53
Episode 30      Last length:    50      Average length: 16.97
Episode 40      Last length:   199      Average length: 21.94
Episode 50      Last length:    26      Average length: 25.18
Episode 60      Last length:    35      Average length: 26.79
Episode 70      Last length:    43      Average length: 28.10
tigerneil commented 7 years ago

I pull some new code and It seems OK now. :) @lynic