lsw9021 / MASS

Apache License 2.0
584 stars 109 forks source link

Why are positions from BVH required after learning ? #7

Open efelem opened 5 years ago

efelem commented 5 years ago

Hi,

After learning a walking gait for several days and observing the results the gait seems ok. However I noticed that the BVH file is still needed to run the network after optimization. I am a bit confused because in the paper there is a claim that says :

Note that PD targets serve as an intermediary between the two network policies in the learning process, but they are not used in actual simulations. Our simulation and control system at runtime is solely muscle-actuated requiring neither PD targets nor PD control at all.

When looking further I noticed that the call to GetActivationFromNN in the Window::Step() methods makes use of the target positions. Indeed in this method there is a call to Environment::GetDesiredTorques(). When looking at this method we can see that the mTargetPositions is used and the output of the mimicking network mAction is added to the desired positions provided by mTargetPositions.

The mTargetPositions is set in the Environment::SetAction method which looks like this :

void 
Environment::
SetAction(const Eigen::VectorXd& a)
{
    mAction = a*0.1;

    double t = mWorld->getTime();

    std::pair<Eigen::VectorXd,Eigen::VectorXd> pv = mCharacter->GetTargetPosAndVel(t,1.0/mControlHz);
    mTargetPositions = pv.first;
    mTargetVelocities = pv.second;

    mSimCount = 0;
    mRandomSampleIndex = rand()%(mSimulationHz/mControlHz);
    mAverageActivationLevels.setZero();
}

I noticed two things : 1) The mTargetPositions comes from the Character::GetTargetPosAndVel which explicitly use the BVH file. 2) the 0.1 factor is multiplying the output of the mimicking controller.

Now my question is, how do you proceed to get rid of this TargetPosition after learning ? Is there a missing part of the code that implements a sort of shrinkage of mTargetPositions in favor of the of output of the mimicking controller. Or did you use an other technique to achieve this ? Or did I mistunderstood some parts of the code ?

Thanks !

lsw9021 commented 5 years ago

Hi Efelem,

We conceptually assume the target pose in the paper as a modified reference pose. The network learns how we modify the reference motion according to the dynamic states of the character. To compute the actual target pose, we add the network output and the pose from the reference motion.