xbpeng / DeepMimic

Motion imitation with deep reinforcement learning.
https://xbpeng.github.io/projects/DeepMimic/index.html
MIT License
2.31k stars 488 forks source link

Some questions about implementing ASI #190

Closed PeterWangyi closed 6 months ago

PeterWangyi commented 1 year ago

Some questions about implementing ASI: I refer to issue41 to implement my own ASI, but there are still some places I don't understand, so I would like to ask for advice. . . First, as mentioned in issue41: For our implementation of ASI, the initial state distribution actually specifies the reduced coordinate pose instead of the policy state. That way, we just get the pose from the initial state distribution, then call SetPose to initialize the character. The ASI in the article describes this initial distribution as an independent Gaussian distribution on each phase, so how do we do it: get the pose from the initial state distribution? After thinking about it for a long time, I couldn't figure it out. . .
And in my understanding, the purpose of ASI is to optimize the initial action of each frame, so that the speed and the like will not be too outrageous, resulting in invalid rounds. I don’t know if I understand it right. . .

Secondly, when I was writing the c++ interface of ASI, I found that when calling env.reset, this setpose https://github.com/xbpeng/DeepMimic/blob/676185cde1bf03790518d3e311ee27dd322bc007/DeepMimicCore/sim/SimCharacter. cpp#L725C57-L725C57 will be executed 4 times. output

In my understanding, if I pass in a pose and assign a value to each part, setpose should only be called once, why is it called 4 times? And the pose passed in each setpose is different, as shown in the figure below

Kneeling for an explanation. . . .

xbpeng commented 1 year ago

If the initial state distribution is implemented as gaussian distributions at different phases of the motion, then to get the initial pose at a particular phase, you can just sample from the gaussian. Is your question about how to sample from the gaussians?

Yes, it's expected that set pose is called multiple times. Since each scene inherits methods from its parents, some of the parent reset method might also try to set the pose to something different.

PeterWangyi commented 1 year ago

thank you for your reply Regarding the first question: maybe my understanding of the Gaussian distribution is somewhat lacking? I observed that both setpose and setvel pass in 43-dimensional vectors, that is, I should pass in 86-dimensional vectors during initialization.

So this 86-dimensional vector is the result of sampling from the initial state distribution? If so, should the distribution on each phase be modeled as an 86-dimensional Gaussian distribution, with each dimension being independent? The output of the network is the mean value of the distribution under the current phase, which is also an 86-dimensional "mu". Combined with a covariance diagonal matrix I set, can the state (pose + vel) required by ASI be sampled directly from here?

If there are 50 phases, will this initial state distribution have 50 * 86 independent Gaussian distributions? I don’t know if my understanding is correct. . .

Sorry for writing so much. . Might be complicated to read, thanks for your patience!

xbpeng commented 1 year ago

Yes, your understanding is correct. If you have 50 phases, then this can be modeled as 50 * 86 independent gaussian distributions.