ikostrikov / pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
MIT License
1.23k stars 280 forks source link

Cant' work on pytorch 0.4.0 #52

Closed jiakai0419 closed 6 years ago

jiakai0419 commented 6 years ago

macOS 10.13.4 Python 3.6.4 pytorch 0.4.0

I encountered an error

~/py-garage/pytorch-a3c(master*) » python3 main.py --env-name "PongDeterministic-v4" --num-processes 1                                                        anya@turing-machine
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: <class 'envs.AtariRescale42x42'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.AtariRescale42x42'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
/Users/anya/py-garage/pytorch-a3c/test.py:37: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  cx = Variable(torch.zeros(1, 256), volatile=True)
/Users/anya/py-garage/pytorch-a3c/test.py:38: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  hx = Variable(torch.zeros(1, 256), volatile=True)
/Users/anya/py-garage/pytorch-a3c/test.py:44: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  state.unsqueeze(0), volatile=True), (hx, cx)))
/Users/anya/py-garage/pytorch-a3c/train.py:55: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  prob = F.softmax(logit)
/Users/anya/py-garage/pytorch-a3c/test.py:45: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  prob = F.softmax(logit)
/Users/anya/py-garage/pytorch-a3c/train.py:56: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  log_prob = F.log_softmax(logit)
Process Process-2:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/anya/py-garage/pytorch-a3c/train.py", line 60, in train
    action = prob.multinomial().data
TypeError: multinomial() missing 1 required positional arguments: "num_samples"
/Users/anya/py-garage/pytorch-a3c/test.py:40: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  cx = Variable(cx.data, volatile=True)
/Users/anya/py-garage/pytorch-a3c/test.py:41: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  hx = Variable(hx.data, volatile=True)
Time 00h 00m 01s, num steps 0, FPS 0, episode reward -21.0, episode length 812

I try to fix it TypeError: multinomial() missing 1 required positional arguments: "num_samples"

diff --git a/train.py b/train.py
index 1b9c139..e3f0143 100644
--- a/train.py
+++ b/train.py
@@ -57,7 +57,7 @@ def train(rank, args, shared_model, counter, lock, optimizer=None):
             entropy = -(log_prob * prob).sum(1, keepdim=True)
             entropies.append(entropy)

-            action = prob.multinomial().data
+            action = prob.multinomial(num_samples=1).data
             log_prob = log_prob.gather(1, Variable(action))

             state, reward, done, _ = env.step(action.numpy())

I encountered a new error

~/py-garage/pytorch-a3c(master*) » python3 main.py --env-name "PongDeterministic-v4" --num-processes 1                                                        anya@turing-machine
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: <class 'envs.AtariRescale42x42'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.AtariRescale42x42'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
/Users/anya/py-garage/pytorch-a3c/test.py:37: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  cx = Variable(torch.zeros(1, 256), volatile=True)
/Users/anya/py-garage/pytorch-a3c/test.py:38: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  hx = Variable(torch.zeros(1, 256), volatile=True)
/Users/anya/py-garage/pytorch-a3c/test.py:44: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  state.unsqueeze(0), volatile=True), (hx, cx)))
/Users/anya/py-garage/pytorch-a3c/train.py:55: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  prob = F.softmax(logit)
/Users/anya/py-garage/pytorch-a3c/train.py:56: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  log_prob = F.log_softmax(logit)
/Users/anya/py-garage/pytorch-a3c/test.py:45: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  prob = F.softmax(logit)
/Users/anya/py-garage/pytorch-a3c/test.py:40: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  cx = Variable(cx.data, volatile=True)
/Users/anya/py-garage/pytorch-a3c/test.py:41: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  hx = Variable(hx.data, volatile=True)
/Users/anya/py-garage/pytorch-a3c/train.py:108: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
  torch.nn.utils.clip_grad_norm(model.parameters(), args.max_grad_norm)
Process Process-2:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/anya/py-garage/pytorch-a3c/train.py", line 111, in train
    optimizer.step()
  File "/Users/anya/py-garage/pytorch-a3c/my_optim.py", line 70, in step
    p.data.addcdiv_(-step_size, exp_avg, denom)
TypeError: addcdiv_() takes 2 positional arguments but 3 were given
Time 00h 00m 01s, num steps 20, FPS 13, episode reward -21.0, episode length 812

I am very confused and cannot fix it. TypeError: addcdiv_() takes 2 positional arguments but 3 were given

ikostrikov commented 6 years ago

Fixed in https://github.com/ikostrikov/pytorch-a3c/commit/e898f7514a03de73a2bf01e7b0f17a6f93963389

Also there are some warnings that do not affect performance of the algorithm.

I will fix them closer to the 0.5.0 release.

jiakai0419 commented 6 years ago

Your efficient response touched me.

jiakai0419 commented 6 years ago

@ikostrikov I think the performance is affected in version 0.4.0

python3 main.py --env-name "PongDeterministic-v4" --num-processes 16
Time 00h 00m 09s, num steps 5031, FPS 519, episode reward -21.0, episode length 812
Time 00h 01m 10s, num steps 35482, FPS 501, episode reward -2.0, episode length 100
Time 00h 02m 11s, num steps 66664, FPS 505, episode reward -2.0, episode length 100
Time 00h 03m 13s, num steps 97058, FPS 503, episode reward -2.0, episode length 100
Time 00h 04m 14s, num steps 128517, FPS 504, episode reward -2.0, episode length 108
Time 00h 05m 24s, num steps 163141, FPS 502, episode reward -21.0, episode length 764
Time 00h 06m 34s, num steps 200426, FPS 508, episode reward -21.0, episode length 764
Time 00h 07m 57s, num steps 245725, FPS 514, episode reward -21.0, episode length 1942
Time 00h 09m 16s, num steps 284730, FPS 511, episode reward -21.0, episode length 1324
Time 00h 10m 41s, num steps 325153, FPS 507, episode reward -21.0, episode length 1324
Time 00h 12m 01s, num steps 361563, FPS 501, episode reward -21.0, episode length 1324
Time 00h 13m 28s, num steps 406910, FPS 503, episode reward -21.0, episode length 1964
Time 00h 14m 53s, num steps 450836, FPS 505, episode reward -21.0, episode length 1964
Time 00h 16m 22s, num steps 493876, FPS 503, episode reward -21.0, episode length 1964
ikostrikov commented 6 years ago

How many cores do you have on your machine?

Is seems to start learning something since the length of the episodes goes up.

jiakai0419 commented 6 years ago
  Number of Processors: 1
  Total Number of Cores:    2

I will test that on 64 cores machine.

ikostrikov commented 6 years ago

On 2 core machine it will just take a lot of time. I would expect a decent reward after 1h of training on Pong.

ph-dev-2016 commented 6 years ago

I'm using pytorch-cpu 0.4.1 and Python3.7 in Windows7.

Still see this error "TypeError: multinomial() missing 1 required positional arguments: "num_samples""

ikostrikov commented 6 years ago

@ph-dev-2016 Try with the most recent version of this repository.