openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.72k stars 4.87k forks source link

Save and load of the trained TRPO and PPO agents #74

Closed ViktorM closed 7 years ago

ViktorM commented 7 years ago

Hi,

How save, load and visualization of the trained agents with TRPO or PPO algorithms can be done?

ashwinreddy commented 7 years ago

Hello,

I think I've figured out how to do it, so I thought I would describe the process and possibly submit a pull request so other people can get started experimenting.

The run_atari.py and run_mujoco.py files will go through the training/learning process and display some running statistics, but it does not save the network's model or parameters when the script completes. Luckily, you can use the tf.train.Saver class to do just that. It's as easy as instantiating the saver and then calling the save method with a session and file prefix.

Then, in another file, you'll want to instantiate the same policy (the default is a MlpPolicy) with the same parameters (hidden size, hidden layers, etc.). Then, use the saver class to restore the model, and voilá, you can call the act function and get back actions. If you want to see the policy in action, just ask the environment to render itself (I assume that's what you meant by visualization).

This solution is a little janky and could probably be improved using the ActWrapper class that the DQN method uses.

joschu commented 7 years ago

Thanks for the explanation, Viktor. I also use the tf.train.Saver, however, it's not in the released code because I figured that people would probably want to roll their own solutions.

dattatreya303 commented 6 years ago

Can we do this in pposgd_learn.py? I want to save the model after every 100 iterations, say. Would simply calling U.save_state(fname) work there?

zishanahmed08 commented 6 years ago

@ashwinreddy Hi Ashwin,

I referred to your commit and am getting the below error suggesting that there is a mismatch in the size and dimensions somehow changes from 22528 to 3136 The only change from your code I do is to replace mlp policy with cnn_policy since the run_atari trains the Pong env on Cnn_policy. Any idea whats going wrong?

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [22528,512] rhs shape= [3136,512] [[Node: save/Assign_7 = Assign[T=DT_FLOAT, _class=["loc:@pi/lin/w"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](pi/lin/w, save/RestoreV2_7)]]

jlabhishek commented 6 years ago

I am also facing tensor shape mismatch issue, anyone got a workaround?

zishanahmed08 commented 6 years ago

@BlazingFire -The error for me was arising because i had missed one of the wrappers. It think it was the Frame Stack wrapper which stacks the 4 images. Hence the mismatch in dimensions.

williamjqk commented 6 years ago

In ashwinreddy's fork, there is an example, but it has been removed from openai baselines