In imagination-augmented-agent.py the imagination core does not correctly one hot encode the actions for the environment model as shown in the lines below:
if self.full_rollout:
state = state.unsqueeze(0).repeat(self.num_actions, 1, 1, 1, 1).view(-1, *self.in_shape)
action = torch.LongTensor([[i] for i in range(self.num_actions)]*batch_size)
rollout_batch_size = batch_size * self.num_actions
else:
action = self.distil_policy.act(Variable(state, volatile=True))
action = action.data.cpu()
rollout_batch_size = batch_size
for step in range(self.num_rolouts):
onehot_action = torch.zeros(rollout_batch_size, self.num_actions, *self.in_shape[1:])
onehot_action[range(rollout_batch_size), action] = 1
For a full rollout the action becomes of a column vector of size [batch_size, 1] this results in the line onehot_action[range(rollout_batch_size), action] = 1 not being correct (and taking a very long time). The fix is to add action = action.view(-1).
In
imagination-augmented-agent.py
the imagination core does not correctly one hot encode the actions for the environment model as shown in the lines below:For a full rollout the action becomes of a column vector of size [batch_size, 1] this results in the line
onehot_action[range(rollout_batch_size), action] = 1
not being correct (and taking a very long time). The fix is to addaction = action.view(-1)
.