PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
This might be a naive question but I am wondering what the general guideline is for setting args.num-steps. I saw the default value is 5, but in the README you also set it to be 128 for Atari and 2048 for MuJoCo.
1) Should this be proportional to the maximum steps in one episode? Say the agent have maximum 500 time steps before an episode terminates. What should I set for args.num-steps?
2) Or is it simply constrained by the GPU memory?
Hi, thank you very much for sharing your code.
This might be a naive question but I am wondering what the general guideline is for setting args.num-steps. I saw the default value is 5, but in the README you also set it to be 128 for Atari and 2048 for MuJoCo.
1) Should this be proportional to the maximum steps in one episode? Say the agent have maximum 500 time steps before an episode terminates. What should I set for args.num-steps? 2) Or is it simply constrained by the GPU memory?
Any help will be greatly appreciated. Thanks!