Open captify-alapite opened 7 years ago
The TRPO learner currently isn't being checkpointed like the other algorithms, but I'll try to get a fix for that in tonight. Also note, at present you would need to explicitly provide any flags that modify the architecture or agent behavior at test time as well, which in this case would be --arch FC --history_length 1 --activation tanh --frame_skip 1
For now I would recommend using the --use_monitor flag during training for any solved environments since the primary objective in those environments is to minimize the amount of training episodes until solve.
Hey there, By any chance, do you still have plans for the following?
The TRPO learner currently isn't being checkpointed like the other algorithms, but I'll try to get a fix for that in tonight.
Thanks!
I do. I've been swamped with work recently so I forgot to take care of this but I should have time to fix it this weekend.
I've tried to use TRPO to create a model for
CartPole-v0
by following the instructions on your OpenAI Gym page, changing the command to the following to reflect the api changes since the score was submitted:This seems to work, with training proceeding as expected and concluding successfully. However, when I try to evaluate the trained model by running
I get the following error.