xbpeng / DeepMimic

Motion imitation with deep reinforcement learning.
https://xbpeng.github.io/projects/DeepMimic/index.html
MIT License
2.29k stars 484 forks source link

Implementing env.setpose() in Python #41

Open young-j-park opened 5 years ago

young-j-park commented 5 years ago

I want to vary the initial state of the humanoid with my custom motion capture data. In other words, I want to implement sth like env.SetPose() in python with minimum modification in deepmimiccore files.

But I could not really how RSI is implemented in deepmimicCore and deepmimicEnv. Can you give me some clues which code I need to read through?

mariolew commented 5 years ago

I have similar question since I'm trying to implement ASI. I found that ASI requires a interface to directly set the state of the env, and I haven't figured out how to do. I'd be appreciated if you offer some hints in implementing ASI.

mariolew commented 5 years ago

@yjparkLiCS You can refer to cKinCharacter::Pose(double time). However , I need something like setState. While ASI seems to be easy to understand, I found it hard to implement, I might got something wrong, but I actually cannot find a easy way to implement ASI.

xbpeng commented 5 years ago

@yjparkLiCS RSI is implemented in the scenes reset code in DeepMimicCore: https://github.com/xbpeng/DeepMimic/blob/676185cde1bf03790518d3e311ee27dd322bc007/DeepMimicCore/scenes/SceneImitate.cpp#L320 It first resets the kinematic character to some random point along the motion, then synchronizes the simulated character with the kinematic character's pose. So there currently not python interface for that. But it shouldn't be too hard to implement you can take a look at how actions are communicated from python to c++: https://github.com/xbpeng/DeepMimic/blob/676185cde1bf03790518d3e311ee27dd322bc007/DeepMimicCore/DeepMimicCore.cpp#L221

@mariolew SetState would be pretty difficult to implement, because recovering the reduced coordinate pose from the state features is pretty tricky. For our implementation of ASI, the initial state distribution actually specifies the reduced coordinate pose instead of the policy state. That way, we just get the pose from the initial state distribution, then call SetPose to initialize the character.

mariolew commented 5 years ago

@xbpeng Hi, Jason, thanks for your patience. As I understand, the state cannot be determined by pose only, state is determined by both pose and velocity, I think just set the pose is not enough, so I'm still a little confused.

xbpeng commented 5 years ago

sorry by pose, i meant pose and velocity. So the initial state distribution will specify both the pose and velocity in reduced coordinates.

mariolew commented 5 years ago

@xbpeng Thanks, I got it, let me try.

mariolew commented 5 years ago

I've implemented asi according to my understanding. However, as the model is training for a few iters, invalid path problem always occur, my guess is that the initialization of the initial state distribution model is not good, I'd like to know how to initialize or normalize those param. BTW, it seems not a common practice to optimize mean and std simultaneously.

xbpeng commented 5 years ago

it could be that the initial velocities are too large. There might be some scaling issue with the standard deviation for the initial state distribution. Maybe you should reduce the scale for the parameters corresponding to the velocities. It would also be helpful to take a look at what the learned initial states look like.

re: learning mean and std It is fairly common to learn the mean and std for gaussian policies. Though instead of a state dependent std, people often just lean a fixed std for all states.

mariolew commented 5 years ago

@xbpeng Thanks for your patience. However, for me, it seems the velocities are too small. And the actions and rewards are easily to get nan value.

I think the initial position of root should be controlled, or it would be easily to produce invalid poses and training would be very time-consuming, since the initial state model would be called frequently. Now I just initialize the params of the initial state distribution network using xaiver initialization, and the results are not good.

As you said, I should make some scaling, but I'm a little confused how to add this scaling.

For example(spinkick), I got nan for actions when state is [ 9.60000000e-01, 1.60155465e-01, -4.69839043e-02, -3.58636129e-02, -3.75008397e-02, 1.54158354e-01, 4.62781310e-01, -4.69112426e-01, 7.36207962e-01, -4.70028277e-02, -3.58250934e-02, -3.74499364e-02, 5.02464467e-01, 3.61715954e-01, -7.35193754e-01, -2.76009380e-01, -4.70032879e-02, -3.58252573e-02, -3.74497416e-02, 9.15735492e-02, 7.86775691e-01, -5.77224462e-01, -1.98520203e-01, -4.70920018e-02, -3.60949683e-02, -3.72445163e-02, 7.92816709e-02, -2.19381644e-01, -9.00024093e-01, 3.68161314e-01, -4.73145923e-02, -3.65458700e-02,-3.67547326e02, 8.74833585e-02, -1.99228354e-01, -9.04698715e-01, 3.66299061e-01, -4.73178606e-02, -3.65472260e-02, -3.67544635e-02, 3.76245721e-01, 2.80510239e-02, 1.79721891e-01, -9.08489024e-01, -4.70001441e-02, -3.58224633e-02, -3.74516999e-02, 2.23052492e-01, 8.86852536e-01, -4.03551465e-01, -2.97739590e-02, -4.69944621e-02, -3.58182165e-02, -3.74549273e-02, 2.24125115e-01, 8.68708152e-01, -4.41256508e-01, -2.01697111e-02, -4.69912395e-02, -3.58160484e-02, -3.74566561e-02, 2.24125115e-01, 8.68708152e-01, -4.41256508e-01, -2.01697111e-02, -4.71872681e-02, -3.59702829e-02, -3.74965572e-02, 8.02727074e-01, 1.77987145e-01, 3.12929863e-01, -4.75420498e-01, -4.75942850e-02, -3.61719403e-02, -3.74722931e-02, 7.70823422e-01, 1.57698727e-01, 3.23628863e-01, -5.25572796e-01, -4.75937263e-02, -3.61716944e-02, -3.74738162e-02, 8.07129577e-01, 2.05337666e-01, 4.93269600e-01, 2.51124240e-01, -4.70028149e-02, -3.58260173e-02, -3.74497503e-02, 9.01060154e-01, -8.25475574e-02, 4.24029291e-01, -3.84134795e-02, -4.70028676e-02, -3.58276936e-02, -3.74493767e-02, 8.97828178e-01, -1.04561448e-01, 4.19144164e-01, -8.53795364e-02, -4.70029284e-02, -3.58285952e-02, -3.74491372e-02, 8.97828178e-01, -1.04561448e-01, 4.19144164e-01, -8.53795364e-02, -8.34437006e-04, -5.36788813e-04, -1.14352506e-03, 2.56446989e-03, 3.01099915e-03, -2.76310793e-03, 9.39669676e-03, 6.34754308e-03, -2.96169260e-03, -6.14636620e-02, 8.13663954e-02, -3.91604937e-02, 8.82284386e-03, 1.08256978e-02, -6.46731464e-04, 1.69517244e-02, 2.13868524e-02, -3.63975715e-02, 4.21981634e-03, 1.64045135e-02, 1.66838297e-02, -1.43804355e-01, -1.78993489e-02, 5.79070001e-02, 1.40736872e-02, 5.02790495e-02, 5.23587639e-02, -1.44328292e-01, -1.89806522e-02, 5.66734281e-02, 1.64322334e-02, 4.47093608e-02, 4.90418190e-02, -9.11145279e-02, -1.14033955e-01, 1.27716675e-01, 1.87607806e-02, -9.28976210e-03, -9.32213671e-03, -5.60209684e-02, 2.43168643e-02, -1.30442699e-01, 4.36467568e-02, -5.13114581e-02, -1.91069926e-02, -8.58005471e-02, -2.32102715e-02, -2.45388488e-01, 6.23530617e-02, -8.19972694e-02, -2.27452450e-02, -8.58005471e-02, -2.32102715e-02, -2.45388488e-01, 4.39297657e-03, -1.00756931e-02, -5.11403796e-03, -2.78467148e-02, -3.72033007e-02, 5.22881232e-02, 8.39131616e-03, -1.46799648e-02, -5.74434724e-04, -6.07807740e-02, 2.04581827e-02, -2.09423446e-02, 9.64379209e-03, -1.89877856e-02, -6.17071664e-04, -1.33458311e-01, -4.33636506e-02, 6.49559805e-02, -8.28936619e-03, 6.73560392e-03, -1.18976068e-03, -1.44945118e-02, 1.84881797e-01, -1.63856706e-01, -4.58382432e-02, 1.15340477e-02, 1.21367458e-02, -8.69524415e-02, 1.73955761e-01, -2.22798738e-01, -6.93990598e-02, 1.66998459e-02, 2.53652373e-02, -8.69524415e-02, 1.73955761e-01, -2.22798738e-01]

young-j-park commented 5 years ago

@xbpeng Thank you. I just solved my question. I wrote a code updating kinchar.mMotion and few variables related to the motion data withiin kinchar.

I will open this issue until @mariolew solves his issue :)

Zju-George commented 5 years ago

@mariolew @yjparkLiCS @xbpeng Have you guys implemented ASI? I found the same question after tried my own mocap data, the result is not good in RSI due to the low quality of my own motion. If you did, could you please give me a hint on how to implenment ASI or kindly share your implementation please? Here is my original mocap data after smoothing and the result after 2000 iters. jump_hop_original jump_hop_2000iters