openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.83k stars 4.88k forks source link

TypeError: 'numpy.float64' in GAIL dataset loading #489

Open hollygrimm opened 6 years ago

hollygrimm commented 6 years ago

https://github.com/openai/baselines/blob/f2729693253c0ef4d4086231d36e0a4307ec1cb3/baselines/gail/dataset/mujoco_dset.py#L53

When I run run_mujoco.py, the code merged in #447 is currently erroring out with

TypeError: 'numpy.float64' object cannot be interpreted as an integer

for some of the envs (deterministic and stochastic) using both MuJoCo 1.50 and 1.31. For MuJoCo 1.50 change env_id to '-v2' below:

Fails w/Deterministic Policy: python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=Humanoid-v1 --expert_path=data/deterministic.trpo.Humanoid.0.00.npz

python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=Walker2d-v1 --expert_path=data/deterministic.trpo.Walker2d.0.00.npz

Fails w/Stochastic Policy: python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=Hopper-v1 --stochastic_policy --expert_path=data/stochastic.trpo.Hopper.0.00.npz

python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=Humanoid-v1 --stochastic_policy --expert_path=data/stochastic.trpo.Humanoid.0.00.npz

python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=Walker2d-v1 --stochastic_policy --expert_path=data/stochastic.trpo.Walker2d.0.00.npz

Works w/Deterministic Policy: python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=HalfCheetah-v1 --expert_path=data/deterministic.trpo.HalfCheetah.0.00.npz

python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=Hopper-v1

python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=HumanoidStandup-v1 --expert_path=data/deterministic.trpo.HumanoidStandup.0.00.npz

Works w/Stochastic Policy: python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=HalfCheetah-v1 --stochastic_policy --expert_path=data/stochastic.trpo.HalfCheetah.0.00.npz

python -m baselines.gail.run_mujoco --traj_limitation=1 --env_id=HumanoidStandup-v1 --stochastic_policy --expert_path=data/stochastic.trpo.HumanoidStandup.0.00.npz

When I use the original code, along with the original bug fix https://github.com/openai/baselines/pull/447/commits/22cab3d980d76e72e1658744421a6d52fc5bb1b8 suggested by @AdamGleave the training works fine for all the environments.

AdamGleave commented 6 years ago

You're getting this error when obs.shape[2:] is an empty list -- in this case, np.prod gets called with an empty shape [] and np.prod([]) returns 1.0, a float, whereas an integer is expected.

Now, I'm surprised obs.shape[2:] is ever empty. At least in the case of Walker2d, this happens when obs.shape == (E,), with the arrays being objects.

I've introduced a fix in #491

mingfeisun commented 5 years ago

I guess this error was caused because the episode lengths in expert data are not equal everywhere. I changed the code into following:

        if len(obs.shape[2:]) != 0:
            self.obs = np.reshape(obs, [-1, np.prod(obs.shape[2:])])
            self.acs = np.reshape(acs, [-1, np.prod(acs.shape[2:])])
        else:
            self.obs = np.vstack(obs)
            self.acs = np.vstack(acs)

And it worked ok. Hope it helps.

See https://github.com/mingfeisun/baselines/blob/ab7540e5a815558332354f018ba7d278933403c9/baselines/gail/dataset/mujoco_dset.py#L53