Computing the average normalized return

xbpeng / DeepMimic

Motion imitation with deep reinforcement learning.

https://xbpeng.github.io/projects/DeepMimic/index.html

MIT License

2.33k stars 489 forks source link

Computing the average normalized return #112

Open ManifoldFR opened 4 years ago

ManifoldFR commented 4 years ago

Hi,

I am trying to reproduce results in the paper and evaluate some new policies I am training (using clips from the SFU dataset and some custom motions extracted using SfV). For comparison, I would like to use the NR (Normalized Return) metric used in the paper.

The training logs have columns Train_Return and Test_Return, though from what I'm seeing they're not the average over 32 episodes; also I'm not sure how the maximum return is defined: for pure imitation and no task, it should be a maximum of 1 per timestep, right (the different reward terms are upper-bounded by 1, which is reached assuming "perfect" imitation, and their weights sum up to 1)? Is there utility code to run a policy while recording the returns and get the NR score? Thanks.

xbpeng commented 4 years ago

Each episode has a maximum length of 20 seconds as specified here: https://github.com/xbpeng/DeepMimic/blob/a458f1e1da928d04ef1434bb53b97264d53f4102/args/train_humanoid3d_spinkick_args.txt#L6 The policy runs at 30 Hz, so a max of 600 timesteps. The reward function is bounded between [0, 1]. So the maximum return that a policy can achieve is 600. To compute the normalized return, just divide the return values in the log by 600.

ManifoldFR commented 4 years ago

Thank you, I saw the maximum time limit but didn't put two and two together. One last question: were the NR score results in the paper computed in training mode (stochastic policy, so the Train_Return column) or testing mode (with the deterministic policy)?

xbpeng commented 4 years ago

we evaluate the performance with the deterministic policy

hellozjj commented 4 years ago

Hi,

I am trying to reproduce results in the paper and evaluate some new policies I am training (using clips from the SFU dataset and some custom motions extracted using SfV). For comparison, I would like to use the NR (Normalized Return) metric used in the paper.

The training logs have columns Train_Return and Test_Return, though from what I'm seeing they're not the average over 32 episodes; also I'm not sure how the maximum return is defined: for pure imitation and no task, it should be a maximum of 1 per timestep, right (the different reward terms are upper-bounded by 1, which is reached assuming "perfect" imitation, and their weights sum up to 1)? Is there utility code to run a policy while recording the returns and get the NR score? Thanks.

@ManifoldFR Could you give me some advice about how to get the root joint position by using SFV? Or how do you extract the root position?

ManifoldFR commented 4 years ago

What I did works if the camera in the scene doesn't move: it recovers the motion of the root joint with respect to the camera's reference frame. What you can do is add the camera translation (using the scale and tx,ty coordinates for the camera, from the pose vector) and recover the corresponding camera [x,y,z] using the image processing parameters (for this I reused the preprocessing/postprocessing code from HMR), then add that to the root joint (it's going to be the hips joints (the same as in the Coco dataset) if you're looking at the SfV output before retargeting).

Here is where I do it in the code we used (the filename is wrong: we are actually converting a csv to Bvh).

You can also try doing the same thing if you're using your own retargeting code.

hellozjj commented 4 years ago

What I did works if the camera in the scene doesn't move: it recovers the motion of the root joint with respect to the camera's reference frame. What you can do is add the camera translation (using the scale and tx,ty coordinates for the camera, from the pose vector) and recover the corresponding camera [x,y,z] using the image processing parameters (for this I reused the preprocessing/postprocessing code from HMR), then add that to the root joint (it's going to be the hips joints (the same as in the Coco dataset) if you're looking at the SfV output before retargeting).

Here is where I do it in the code we used (the filename is wrong: we are actually converting a csv to Bvh).

You can also try doing the same thing if you're using your own retargeting code.

@ManifoldFR Thanks very much!