Open raffaello-camoriano opened 5 years ago
You can get the center of mass by calling cSimCharacter::CalcCOMVel().
To vary the target heading during training, we just apply some random gaussian noise to the heading direction every timestep. The noise is fairly small. But with something like 10% probability, at every timestep, we will also just sample a completely new heading direction between [-pi, pi]. This encourages the policy to also learn sharp turns.
You can get the center of mass by calling cSimCharacter::CalcCOMVel().
To vary the target heading during training, we just apply some random gaussian noise to the heading direction every timestep. The noise is fairly small. But with something like 10% probability, at every timestep, we will also just sample a completely new heading direction between [-pi, pi]. This encourages the policy to also learn sharp turns.
@xbpeng I have tried the above approach but the result is usually the agent walking in circles (like it is taking an average of the heading). I am wondering what is the ratio of the heading and pose reward for getting a heading behavior?
Also 10% at every timestep seems very high, like is this timestep 0.00166667s? For 10% the kin model is really jittering.
Yes you will need to tune the weight between the heading and pose reward. For our tasks we were using 0.7 for the weight of the imitation reward and 0.3 for the heading reward. To debug, it might be helpful to just fix the target heading to always point in one direction (e.g. the positive x direction) and see if the character learns to walk straight.
Oh, sorry I forgot about a little detail for the target heading. We are not updating the target heading every environment step. We update the target heading once every 0.25 seconds by applying uniform noise between [-0.15, 0.15]rad to the current heading. Then we also have a 10% prob every 0.25 seconds of changing to a completely random heading. Sorry about the confusion. See if these new settings help at all.
Dear @xbpeng,
I was wondering if the 5 walking, running and jogging clips with different turning radii which you mentioned in Section 10.2 of the paper and show in the main video at 5:32 are downloadable somewhere or could kindly be provided, since I want to investigate the performance gap between single-clip and multi-clip integration steerable walking controllers.
Sure, you can find the motions here: https://drive.google.com/drive/folders/1TRINX1efQQV4KkWeOQjhYGNoz8PHKqgT?usp=sharing
Thank you so much!
@xbpeng @raffaello-camoriano @ZhengyiLuo Do I have to add heading_reward here?
@xbpeng Should I set root_w = 0
after adding the heading-based task_reward
?
There seems to be a conflict between task_reward
and pure-imitation root_w * root_reward
, since:
root_reward
includes root position and orientation mismatch w.r.t. straight-walking MoCap (which I could not find in Sec. 5.3), which penalizes non-straight-walkingtask_reward
, on the other hand, rewards walking with a generic headingI also modified the code for expressing the end_eff_reward
relative to the Sim or Kin humanoids' root instead of the world frame, but this is not enough to fix the issue.
The best behaviour obtained with these settings is the humanoid walking in circles.
Do you have any suggestions?
Thank you.
@xbpeng @raffaello-camoriano @ZhengyiLuo Do I have to add heading_reward here?
Yes.
@xbpeng @raffaello-camoriano @ZhengyiLuo Do I have to add heading_reward here?
Yes, you can add an additional reward term for the heading there.
@xbpeng Should I set
root_w = 0
after adding the heading-basedtask_reward
?There seems to be a conflict between
task_reward
and pure-imitationroot_w * root_reward
, since:
root_reward
includes root position and orientation mismatch w.r.t. straight-walking MoCap (which I could not find in Sec. 5.3), which penalizes non-straight-walkingtask_reward
, on the other hand, rewards walking with a generic headingI also modified the code for expressing the
end_eff_reward
relative to the Sim or Kin humanoids' root instead of the world frame, but this is not enough to fix the issue.The best behaviour obtained with these settings is the humanoid walking in circles.
Do you have any suggestions?
Thank you.
Yes, it might be a good idea to disable the root reward if you are going to add the heading term. You can also play around with the weights for the different objectives a bit to get the desired behaviours. I'm not sure why the character would be walking in circles. But sounds like it could be an issue with the heading reward?
@xbpeng , first of all, thanks for sharing your outstanding work. I'm trying to implement the heading too. Thanks for the extra details about how and when you change the heading (and also for the extra motion files with human turning left and right). I have added the reward here:
double cSceneImitate::CalcReward(int agent_id) const
{
const cSimCharacter* sim_char = GetAgentChar(agent_id);
bool fallen = HasFallen(*sim_char);
double r = 0;
int max_id = 0;
if (!fallen)
{
if (haveGoalDir()) {
r = CalcRewardImitate(*sim_char, *mKinChar) * 0.7 + CalcRewardGoal(*sim_char, *mKinChar) * 0.3;
} else {
r = CalcRewardImitate(*sim_char, *mKinChar);
}
}
return r;
}
double cSceneImitate::CalcRewardGoal(const cSimCharacter& sim_char, const cKinCharacter& kin_char) const
{
double reward = 0.0;
auto ct_ctrl = dynamic_cast<cCtController *>(this->GetController().get());
if (ct_ctrl != nullptr)
{
const auto goalDir = ct_ctrl->GetGoalDir();
// Get the Center-Of-Mass velocity along the XZ plane only...
tVector com_vel0_world = sim_char.CalcCOMVel();
com_vel0_world.y() = 0.0;
com_vel0_world.normalize();
const auto angle = __max(0.0, 1.0 - com_vel0_world.dot(goalDir));
reward = exp(-2.5 * angle * angle);
}
return reward;
}
How much it looks correct in your opinion?
P.S.: I must definitely try the suggestion to set root_w to 0 (or close to zero).
Hi and thanks again for your amazing work.
Since the provided code exclusively covers imitation, I am trying to implement myself the humanoid walking policy with directional input.
To do so, I am following the details in the paper, and in particular Sec. 9 - Target Heading.
I have a couple of questions on implementation details to begin with:
d_t^*
and speedsv^*
randomly generated during policy training? Are they varied in order to produce "informative" bends, changes in pace and otherwise rich walking behaviors? If this is the case, could you please explain it at an implementable level of detail?v_t
, the center-of-mass velocity of the simulated character?Thanks a lot.