xbpeng / DeepMimic

Motion imitation with deep reinforcement learning.
https://xbpeng.github.io/projects/DeepMimic/index.html
MIT License
2.27k stars 485 forks source link

AMP: Samples per update iteration, RSI, Number of parallel agents #178

Closed braunjon closed 2 years ago

braunjon commented 2 years ago

Hi

I am considering porting AMP to a different physics engine and would like to ask a few questions to clarify things for me.

  1. Is the max. episode length for all tasks in AMP 20s?
  2. How many agents are you running in parallel?
    • Are they starting from the same or a different initial state?
  3. In the appendix of the paper it lists "Samples Per Update Iteration 4096"
    • Does each agent gather 4096 samples or are their samples summed?
    • After the update are you resetting the environments with new reference states (RSI)? Or are you continuing from the last state until the time reaches the max. episode length?
  4. According to the paper you follow early termination of deep mimic: After a non-feet body-part touches the ground the humanoid receives 0 reward (and 0 style reward) independent of the action until the time reaches the max. episode length? Is the policy and discriminator updated with samples after early termination has been triggered?

I appreciate your help!

xbpeng commented 2 years ago

sure thing, here's some info:

  1. most tasks are 20s long, but for some tasks like the dribble task, the episodes can be longer (100s)
  2. Each process only simulates one character at a time. But we usually train with multiple processes (e.g. 16). The initial states are usually randomized.
  3. A total of 4096 are gathered across all worker processes per iterations. We always finish one episode before performing updates. So we don't truncate an episode part way in order to update the model.
  4. If the episode terminates early, we don't simulate the rest of the episode. This in effect sets all future rewards to 0.