xbpeng / awr

Implementation of advantage-weighted regression.
MIT License
178 stars 36 forks source link

Parameters used for motion imitation #2

Open ManifoldFR opened 4 years ago

ManifoldFR commented 4 years ago

Hello,

I am trying to use this algorithm (rewritten in PyTorch with Gym vectorized envs) for motion imitation, starting with the PyBullet implementation of the DeepMimic environment. In the paper, section 5.3, there is a comparison of DeepMimic's modified off-policy PPO with AWR and RWR on some of DeepMimic's tasks, but no further information was given on which hyperparameters were used there.

The appendix gives some parameters which I think apply to the usual MuJoCo benchmarks, but I'm not sure if they also apply to the DeepMimic tasks (for instance the MLP hidden dimensions of (128, 64) don't seem right for DeepMimic since the original paper uses (1024, 512)).

xbpeng commented 4 years ago

Sure, here're the hyperparameters for the motion imitation tasks with the humanoid:

"actor_net_layers": [1024, 512], "actor_stepsize": 0.0000015, "actor_momentum": 0.9, "actor_init_output_scale": 0.01, "actor_batch_size": 256, "actor_steps": 200, "action_std": 0.05,

"critic_net_layers": [1024, 512], "critic_stepsize": 0.01, "critic_momentum": 0.9, "critic_batch_size": 256, "critic_steps": 100,

"discount": 0.95, "samples_per_iter": 4096, "replay_buffer_size": 50000, "normalizer_samples": 1000000,

"weight_clip": 50, "td_lambda": 0.95, "temp": 1.0,

ManifoldFR commented 4 years ago

Thanks! I also have another couple of questions: were actions normalized like in the original DeepMimic code, and was MPI used to speed up data collection and train agents?

xbpeng commented 4 years ago

yes, actions were also normalized. Besides using AWR instead of PPO, the rest of the setup was the same.

ManifoldFR commented 4 years ago

In the paper's appendix C it is said a temperature of 0.05 is used with step size 0.00005, though the config file in this repo sets the temperature to 1.0 and changes the learning rates -- which one should be used ? I can see where the tradeoff happens with this parameter: in my experiments, adjusting it made a difference between being able to train on an environment or not

xbpeng commented 4 years ago

In the code we are using advantage normalization, so the temperature is just set to 1.0. The temp of 0.05 was used without advantage normalization. If you are using the code, a temp of 1 should work for the tasks.

ManifoldFR commented 4 years ago

Thanks ! I'm interested in how the temperature and weight clip interact: I guess having a lot of weights clipped should be bad news, right? because intuitively if half of the weights are set to 20 then you lose info on the relative quality of the corresponding actions in the gradient -- perhaps I'll look into it.