openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.66k stars 4.86k forks source link

PPO2 has fixed batch size in observation placeholder, disallowing querying the action for a single observation. #515

Open pimdh opened 6 years ago

pimdh commented 6 years ago

When using the run script with a trained PPO2 model, it fails on model.step(obs), for a single observation, since the observation placeholder a has a fixed batch size of the batch sized chosen for the training. Removing the fixed batch size at https://github.com/openai/baselines/blob/master/baselines/common/policies.py#L125 resolves this.

pzhokhov commented 6 years ago

ppo2 needs fixed batch size when dealing with recurrent policies. Could you provide more details on the failure? I suspect it has something to do with number of environments simulated in parallel larger than one, and then trying to call model.step() with obs coming from a single env. In this case, you need to re-load the weights into a model that has been created for a single environment. For instance:

# train the model with 8 environments in parallel:
python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_env=8 --save_path=~/models/pongmodel --num_timesteps=1e6 
# load the model and render single environment:
python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_env=1 --load_path=~/models/pongmodel --num_timesteps=0 --play
pimdh commented 6 years ago

Ah, thanks, that solves it. Maybe the README can be updated? Currently it contains the failing command:

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num-timesteps=0 --load_path=~/models/pong_20M_ppo2 --play
pzhokhov commented 6 years ago

Good point, thanks! I'll update either readme or the code to make the line above work.