Closed captify-alapite closed 7 years ago
If you try it with the flag '--history_length 1' it should work-- I'm appending timestep to the observation for fitting the value function and at present it's not coded to deal with the case where the network input is several observations concat-ed together.
I should refactor this to work with more general history lengths and in the meantime raise an explicit error indicating that trpo expects a history_length of 1
That worked great. Thanks again for the quick response!
Hi, thanks for your quick response to the previous issue I submitted. I've been trying out training with the
MountainCarContinuous-v0
environment, and have been able to run it with all of the continuous algorithms other thantrpo-continuous
, which gives me the following error.