AMP: Samples per update iteration, RSI, Number of parallel agents

xbpeng / DeepMimic

Motion imitation with deep reinforcement learning.

MIT License

2.27k stars 485 forks source link

I am considering porting AMP to a different physics engine and would like to ask a few questions to clarify things for me.

Is the max. episode length for all tasks in AMP 20s?
How many agents are you running in parallel?
- Are they starting from the same or a different initial state?
In the appendix of the paper it lists "Samples Per Update Iteration 4096"
- Does each agent gather 4096 samples or are their samples summed?
- After the update are you resetting the environments with new reference states (RSI)? Or are you continuing from the last state until the time reaches the max. episode length?
According to the paper you follow early termination of deep mimic: After a non-feet body-part touches the ground the humanoid receives 0 reward (and 0 style reward) independent of the action until the time reaches the max. episode length? Is the policy and discriminator updated with samples after early termination has been triggered?

I appreciate your help!

xbpeng / DeepMimic