Open hollygrimm opened 6 years ago
You're getting this error when obs.shape[2:] is an empty list -- in this case, np.prod gets called with an empty shape [] and np.prod([]) returns 1.0, a float, whereas an integer is expected.
Now, I'm surprised obs.shape[2:] is ever empty. At least in the case of Walker2d, this happens when obs.shape == (E,), with the arrays being objects.
I've introduced a fix in #491
I guess this error was caused because the episode lengths in expert data are not equal everywhere. I changed the code into following:
if len(obs.shape[2:]) != 0:
self.obs = np.reshape(obs, [-1, np.prod(obs.shape[2:])])
self.acs = np.reshape(acs, [-1, np.prod(acs.shape[2:])])
else:
self.obs = np.vstack(obs)
self.acs = np.vstack(acs)
And it worked ok. Hope it helps.
https://github.com/openai/baselines/blob/f2729693253c0ef4d4086231d36e0a4307ec1cb3/baselines/gail/dataset/mujoco_dset.py#L53
When I run run_mujoco.py, the code merged in #447 is currently erroring out with
for some of the envs (deterministic and stochastic) using both MuJoCo 1.50 and 1.31. For MuJoCo 1.50 change env_id to '-v2' below:
Fails w/Deterministic Policy:
python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=Humanoid-v1 --expert_path=data/deterministic.trpo.Humanoid.0.00.npz
python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=Walker2d-v1 --expert_path=data/deterministic.trpo.Walker2d.0.00.npz
Fails w/Stochastic Policy:
python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=Hopper-v1 --stochastic_policy --expert_path=data/stochastic.trpo.Hopper.0.00.npz
python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=Humanoid-v1 --stochastic_policy --expert_path=data/stochastic.trpo.Humanoid.0.00.npz
python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=Walker2d-v1 --stochastic_policy --expert_path=data/stochastic.trpo.Walker2d.0.00.npz
Works w/Deterministic Policy:
python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=HalfCheetah-v1 --expert_path=data/deterministic.trpo.HalfCheetah.0.00.npz
python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=Hopper-v1
python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=HumanoidStandup-v1 --expert_path=data/deterministic.trpo.HumanoidStandup.0.00.npz
Works w/Stochastic Policy:
python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=HalfCheetah-v1 --stochastic_policy --expert_path=data/stochastic.trpo.HalfCheetah.0.00.npz
python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=HumanoidStandup-v1 --stochastic_policy --expert_path=data/stochastic.trpo.HumanoidStandup.0.00.npz
When I use the original code, along with the original bug fix https://github.com/openai/baselines/pull/447/commits/22cab3d980d76e72e1658744421a6d52fc5bb1b8 suggested by @AdamGleave the training works fine for all the environments.