Closed Akella17 closed 5 years ago
Right now it works for me for gym's continuous control environments (mujoco).
Can you send me an example where it fails?
The algorithm works for MuJoCo either way. However, it fails for Gym's Classic Control environments (e.g. Pendulum-v0) or any other Gym environment with output_dim = 1
. The squeeze operation converts the 1D action vector ([batch_size, 1]
to [batch_size,]
) to scalar while passing it to env.step()
method, which in turn raises an error message.
What I have observed is that removing the squeeze() method makes the algorithm compatible with Classic Control environments while having no effect on MuJoCo or other working environments.
I want to know the significance of squeeze operation (line: 162) in
a2c_ppo_acktr/envs.py
. The squeeze operation sends scalar values as action_value instead of singly-sized vectors for environments withaction_dim = 1
.Suggested correction: Removing the squeeze operation allows training gym's continuous control environments in addition to existing environments.