Multiclip rewards reaches a plateau

xbpeng / DeepMimic

Motion imitation with deep reinforcement learning.

https://xbpeng.github.io/projects/DeepMimic/index.html

MIT License

2.28k stars 485 forks source link

Multiclip rewards reaches a plateau #129

Open tfederico opened 3 years ago

tfederico commented 3 years ago

Hello,

I tried training the character with the multiclip reward as described in the paper. However, the reward reaches a plateau and the character limps.

Do you have any suggestion about why this might be happening? Which rewards did you use to train the multiclip? The ones in the code or the ones in the paper?

xbpeng commented 3 years ago

The reward in the code should work for imitating multiple walking clips. It's a bit hard to tell what might be going wrong. But as a first guess, make sure that all the reference motions are synchronized. So their duration should all be scaled to be the same length so that a single phase variable is valid for every single motion.

tfederico commented 3 years ago

I think I am fine with that. Have you ever tried to use the same reward for multiple clips that were different from walking? (e.g., moving arms)