higgsfield / RL-Adventure

Pytorch Implementation of DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL
2.99k stars 587 forks source link

Error - possibly due to "Variable()" ? #22

Open joleeson opened 5 years ago

joleeson commented 5 years ago

Hi, many thanks for sharing the code.

I have experienced an error running 1.dqn straight out of the box. The error message shown after I run the 12th cell of code is as shown below.

My computer is running with PyTorch 0.4.1, and I suspect that the error is due to a change in the "Variable" API (as used in cells 8 and 10 for example)? If so, has anyone updated the code for the latest PyTorch 0.4.1?

Any ideas would be appreciated! Thanks in advance!


Error message after cell 12:


/home/USER/anaconda3/envs/RL/lib/python3.7/site-packages/ipykernel_launcher.py:2: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.


AssertionError Traceback (most recent call last)

in () 12 action = model.act(state, epsilon) 13 ---> 14 next_state, reward, done, _ = env.step(action) 15 replay_buffer.push(state, action, reward, next_state, done) 16 ~/anaconda3/envs/RL/lib/python3.7/site-packages/gym/wrappers/time_limit.py in step(self, action) 29 def step(self, action): 30 assert self._episode_started_at is not None, "Cannot call env.step() before calling reset()" ---> 31 observation, reward, done, info = self.env.step(action) 32 self._elapsed_steps += 1 33 ~/anaconda3/envs/RL/lib/python3.7/site-packages/gym/envs/classic_control/cartpole.py in step(self, action) 52 53 def step(self, action): ---> 54 assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action)) 55 state = self.state 56 x, x_dot, theta, theta_dot = state AssertionError: tensor(0) () invalid
kinghs commented 5 years ago

modify the code like following: action = q_value.max(1)[1].data[0] -> action = q_value.max(1)[1].item() losses.append(loss.data[0]) -> losses.append(loss.item())

it's works for me, and my PyTorch version is 1.0

joleeson commented 5 years ago

modify the code like following: action = q_value.max(1)[1].data[0] -> action = q_value.max(1)[1].item() losses.append(loss.data[0]) -> losses.append(loss.item())

it's works for me, and my PyTorch version is 1.0

Hi kinghs, Many thanks for your reply.

For the benefit of other users who may or may not be familiar with PyTorch: I assume you made your suggested modification because Variables now "return tensors instead of variables". See the Pytorch 0.40 documentation on Variable (deprecated).

The issue was raised here: https://github.com/higgsfield/RL-Adventure/pull/20

It is also possible to create tensors as such state = torch.FloatTensor(state, device=device), but it appears the changes you mentioned are still necessary.