Error - possibly due to "Variable()" ? #22

Open joleeson opened 5 years ago

joleeson commented 5 years ago

Hi, many thanks for sharing the code.

I have experienced an error running 1.dqn straight out of the box. The error message shown after I run the 12th cell of code is as shown below.

My computer is running with PyTorch 0.4.1, and I suspect that the error is due to a change in the "Variable" API (as used in cells 8 and 10 for example)? If so, has anyone updated the code for the latest PyTorch 0.4.1?

Any ideas would be appreciated! Thanks in advance!

Error message after cell 12:

/home/USER/anaconda3/envs/RL/lib/python3.7/site-packages/ipykernel_launcher.py:2: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.

AssertionError Traceback (most recent call last)

in () 12 action = model.act(state, epsilon) 13 ---> 14 next_state, reward, done, _ = env.step(action) 15 replay_buffer.push(state, action, reward, next_state, done) 16 ~/anaconda3/envs/RL/lib/python3.7/site-packages/gym/wrappers/time_limit.py in step(self, action) 29 def step(self, action): 30 assert self._episode_started_at is not None, "Cannot call env.step() before calling reset()" ---> 31 observation, reward, done, info = self.env.step(action) 32 self._elapsed_steps += 1 33 ~/anaconda3/envs/RL/lib/python3.7/site-packages/gym/envs/classic_control/cartpole.py in step(self, action) 52 53 def step(self, action): ---> 54 assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action)) 55 state = self.state 56 x, x_dot, theta, theta_dot = state AssertionError: tensor(0) () invalid
kinghs commented 5 years ago

modify the code like following: action = q_value.max(1)[1].data[0] -> action = q_value.max(1)[1].item() losses.append(loss.data[0]) -> losses.append(loss.item())

it's works for me, and my PyTorch version is 1.0

joleeson commented 5 years ago

Hi kinghs, Many thanks for your reply.

For the benefit of other users who may or may not be familiar with PyTorch: I assume you made your suggested modification because Variables now "return tensors instead of variables". See the Pytorch 0.40 documentation on Variable (deprecated).

The issue was raised here: https://github.com/higgsfield/RL-Adventure/pull/20

It is also possible to create tensors as such state = torch.FloatTensor(state, device=device), but it appears the changes you mentioned are still necessary.