Can't use MountainCarContinuous-v0 with trpo-continuous

captify-alapite commented 7 years ago

Hi, thanks for your quick response to the previous issue I submitted. I've been trying out training with the MountainCarContinuous-v0 environment, and have been able to run it with all of the continuous algorithms other than trpo-continuous, which gives me the following error.


[2017-05-24 17:04:45] INFO [MainThread:222] Error reported to Coordinator: <type 'exceptions.ValueError'>, all the input array dimensions except for the concatenation axis must match exactly
[2017-05-24 17:04:45,735] Error reported to Coordinator: <type 'exceptions.ValueError'>, all the input array dimensions except for the concatenation axis must match exactly
Process TRPOLearner-1:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/abiolalapite/Documents/Code/ThirdParty/tensorflow-rl/algorithms/actor_learner.py", line 254, in run
    self.train()
  File "/Users/abiolalapite/Documents/Code/ThirdParty/tensorflow-rl/algorithms/trpo_actor_learner.py", line 358, in train
    self._run_master()
  File "/Users/abiolalapite/Documents/Code/ThirdParty/tensorflow-rl/algorithms/trpo_actor_learner.py", line 337, in _run_master
    values = self.predict_values(worker_data)
  File "/Users/abiolalapite/Documents/Code/ThirdParty/tensorflow-rl/algorithms/trpo_actor_learner.py", line 229, in predict_values
    'timestep': np.array(data['timestep'])})
  File "/Users/abiolalapite/Documents/Code/ThirdParty/tensorflow-rl/algorithms/trpo_actor_learner.py", line 221, in preprocess_value_state
    return np.hstack([data['state'], data['timestep'].reshape(-1, 1, 1)])
  File "/Users/abiolalapite/intellij-tf/lib/python2.7/site-packages/numpy/core/shape_base.py", line 288, in hstack
    return _nx.concatenate(arrs, 1)
ValueError: all the input array dimensions except for the concatenation axis must match exactly```

steveKapturowski commented 7 years ago

If you try it with the flag '--history_length 1' it should work-- I'm appending timestep to the observation for fitting the value function and at present it's not coded to deal with the case where the network input is several observations concat-ed together.

I should refactor this to work with more general history lengths and in the meantime raise an explicit error indicating that trpo expects a history_length of 1

captify-alapite commented 7 years ago

That worked great. Thanks again for the quick response!

steveKapturowski / tensorflow-rl

Can't use MountainCarContinuous-v0 with trpo-continuous #4