Usage of Recurrent PPO for MuJoCo Reproduction

pfnet / pfrl

PFRL: a PyTorch-based deep reinforcement learning library

MIT License

1.2k stars 157 forks source link

Usage of Recurrent PPO for MuJoCo Reproduction #106

Closed xylee95 closed 3 years ago

xylee95 commented 3 years ago

Hi,

I'm curious if recurrent policy for PP is supported for other environments besides atari?

I've tried adapting the train_ppo_ale.py code shown in atari in the mujoco reproduction code, but I'm faced multiple errors. The train_ppo_ale.py for atari works but when I switch the policy in the train_ppo.py for mujoco to the recurrent_sequential class, I'm getting the following error. Does using recurrent policies require modification on the mujoco environments?

xylee95 commented 3 years ago

I would also like to add that after going through the code in train_ppo_ale.py carefully, the only difference I could tell was the use of the RecurrentSequential policy and the argument of recurrent = True when initializing the PPO agent for recurrent use. So, I'm not sure if more modifications are needed to run the recurrent policy for mujoco.

muupan commented 3 years ago

The error message suggests that batch_value is a tuple, which is unexpected. Can you show your model definition?

xylee95 commented 3 years ago

Hi,

I have another update. If I used the model definition following mujoco code ( separate policy and value function ) as shown below, the error occurs.

However, when I used the atari's style of model definition, the code runs fine. Am I making a mistake somewhere with the code above?

muupan commented 3 years ago

The outmost module must implement the recurrent interface. The outmost module of your former definition is Branched, which does not implement the recurrent interface. That of your latter definition is RecurrentSequential, which does implement it.

Replacing Branched in the former definition with RecurrentBranched would resolve the issue.

xylee95 commented 3 years ago

Thank you, that resolved it!