Closed xylee95 closed 3 years ago
I would also like to add that after going through the code in train_ppo_ale.py carefully, the only difference I could tell was the use of the RecurrentSequential policy and the argument of recurrent = True when initializing the PPO agent for recurrent use. So, I'm not sure if more modifications are needed to run the recurrent policy for mujoco.
The error message suggests that batch_value
is a tuple, which is unexpected. Can you show your model definition?
Hi,
I have another update. If I used the model definition following mujoco code ( separate policy and value function ) as shown below, the error occurs.
However, when I used the atari's style of model definition, the code runs fine. Am I making a mistake somewhere with the code above?
The outmost module must implement the recurrent interface. The outmost module of your former definition is Branched
, which does not implement the recurrent interface. That of your latter definition is RecurrentSequential
, which does implement it.
Replacing Branched
in the former definition with RecurrentBranched
would resolve the issue.
Thank you, that resolved it!
Hi,
I'm curious if recurrent policy for PP is supported for other environments besides atari?
I've tried adapting the train_ppo_ale.py code shown in atari in the mujoco reproduction code, but I'm faced multiple errors. The train_ppo_ale.py for atari works but when I switch the policy in the train_ppo.py for mujoco to the recurrent_sequential class, I'm getting the following error. Does using recurrent policies require modification on the mujoco environments?