Simple way to run trained policy with PPO1/PPO2/TRPO?

mfe7 commented 6 years ago

First of all, thank you for providing these great baselines!

I can train the policies for the various algorithms (PPO1/PPO2/TRPO) and see that average reward increases and loss decreases, but is there a straightforward way to then simply evaluate the learned policy on an environment with a bunch of different initial conditions?

In PPO2, there's a way to save the model checkpoint, but is there any documentation about how to load that checkpoint? If not, do you have any suggestions on how best I might add this?

This functionality seems to be included in deepq, where there is a train.py and enjoy.py for sample environments, and it seems simpler in that the training procedure generates a single .pkl file that can be loaded by the enjoy procedure in one line.

20chase commented 6 years ago

@mfe7 Hi,

I have a simple version to load checkpoint for ppo2. Here is my code

sreejithb commented 5 years ago

@mfe7 Hi,

I have a simple version to load checkpoint for ppo2. Here is my code

Hi Sir,

I am still struggling to figure this out. I trained a policy using PPO2 and used the --save_path argument to save checkpoints. But now I am lost. How do I use this trained policy?

Any help would be appreciated.

anton-matosov commented 5 years ago

For those looking the answer here is a good notebook with examples and it includes link to medium article explaining the same https://colab.research.google.com/drive/1KoAQ1C_BNtGV3sVvZCnNZaER9rstmy0s

openai / baselines

Simple way to run trained policy with PPO1/PPO2/TRPO? #263