xbpeng / DeepMimic

Motion imitation with deep reinforcement learning.
https://xbpeng.github.io/projects/DeepMimic/index.html
MIT License
2.33k stars 489 forks source link

Discriminator function only using one observation instead of two as in the paper #150

Closed EricVoll closed 3 years ago

EricVoll commented 3 years ago

Hi @xbpeng ,

I noticed that the discriminator is only using one tensor as an input. link to code. But the paper mentions a function D which is defined as a function mapping two states mapped through phi to R: D(phi(s), phi(s')). I guess a discriminator could work well enough it doesn't discriminate state-transitions but instead the mapping phi(s) includes enough information to characterize the movement well enough by including velocities etc. and then one observation is enough.

Is that the case or am I interpreting something wrong?

def _calc_disc_reward(self, amp_obs):
        feed = {
            self._amp_obs_agent_ph: amp_obs,
        }
        logits = self.sess.run(self._disc_logits_agent_tf, feed_dict=feed)
...

Another explanation for your implementation I came up with was that the observations on the python side already stack two consecutive observations vertically. But I did not manage to answer that hypothesis myself...

Any input would be awesome. Thanks!

xbpeng commented 3 years ago

You're right, amp_obs is stacking the observations from two consecutive states as input. This is done on the c++ side: https://github.com/xbpeng/DeepMimic/blob/60ebe88b27634ab697d5d3b4d80f2bfd7cd9b23a/DeepMimicCore/scenes/SceneImitateAMP.cpp#L111 and https://github.com/xbpeng/DeepMimic/blob/60ebe88b27634ab697d5d3b4d80f2bfd7cd9b23a/DeepMimicCore/scenes/SceneImitateAMP.cpp#L138

EricVoll commented 3 years ago

Understood. Thanks for the quick response! I did not manage to find that part on the c++ side for some reason. Looks obvious now though.