xbpeng / DeepMimic

Motion imitation with deep reinforcement learning.
https://xbpeng.github.io/projects/DeepMimic/index.html
MIT License
2.29k stars 484 forks source link

It does not work when running with torque controller #63

Open zijiaozeng opened 5 years ago

zijiaozeng commented 5 years ago

When trying to train with torque controller, configure train_humanoid3d_walk_args.txt with "--char_ctrls ct", it failed due to a bug .

Bug details

DeepMimicCore/sim/CtCtrlUtil.cpp

void cCtCtrlUtil::BuildOffsetScaleTorque(const Eigen::MatrixXd& joint_mat, int joint_id, Eigen::VectorXd& out_offset, Eigen::VectorXd& out_scale)
{
        ...
        switch (joint_type)
    {
    case cKinTree::eJointTypeRevolute:
        out_scale.fill(1 / torque_lim);
        break;
    case cKinTree::eJointTypePrismatic:
        out_scale.fill(1 / force_lim);
        break;
    case cKinTree::eJointTypePlanar:
        out_scale.fill(1 / force_lim);
        out_scale[joint_dim - 1] = 1 / torque_lim;
        break;
    case cKinTree::eJointTypeFixed:
        break;
    case cKinTree::eJointTypeSpherical:
        out_scale.fill(1 / torque_lim);
        out_scale[joint_dim - 1] = 0;
        break;
    default:
        assert(false); // unsupported joint type
        break;
    }
}

learning/tf_agent.py

    def _build_normalizers(self):
        with self.sess.as_default(), self.graph.as_default(), tf.variable_scope(self.tf_scope):
            with tf.variable_scope(self.RESOURCE_SCOPE):
                self.s_norm = TFNormalizer(self.sess, 's_norm', self.get_state_size(), self.world.env.build_state_norm_groups(self.id))
                self.s_norm.set_mean_std(-self.world.env.build_state_offset(self.id), 
                                         1 / self.world.env.build_state_scale(self.id))
                self.g_norm = TFNormalizer(self.sess, 'g_norm', self.get_goal_size(), self.world.env.build_goal_norm_groups(self.id))
                self.g_norm.set_mean_std(-self.world.env.build_goal_offset(self.id), 
                                         1 / self.world.env.build_goal_scale(self.id))

                self.a_norm = TFNormalizer(self.sess, 'a_norm', self.get_action_size())
                self.a_norm.set_mean_std(-self.world.env.build_action_offset(self.id), 
                                         1 / self.world.env.build_action_scale(self.id))
        return

When "out_scale[joint_dim - 1] = 0;", "1 / self.world.env.build_action_scale(self.id)" is infinite. I fixed it with "out_scale[joint_dim - 1] = 1", but it still can't converge after 100000 iterations.

xbpeng commented 5 years ago

Sorry but we haven't tried training with the torque controller, so that is likely not going to work without some major modifications and tuning. But it would be great if you are able to get it to work!