PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.53k
stars
832
forks
source link
Does setting the flag "use-proper-time-limits" to be True recommended for all gym environments with time limit? #259
It seems to me that this flag should be set to True whenever an environment has a max episode time limit. To make this flag effective, the original env should be wrapped with a TimeLimitMask gym wrapper. However, line 47-48 in envs.py suggests that the wrapper will be applied only when the env class name contains the string 'TimeLimit'. Could you explain why? Thank you!
It seems to me that this flag should be set to True whenever an environment has a max episode time limit. To make this flag effective, the original env should be wrapped with a TimeLimitMask gym wrapper. However, line 47-48 in envs.py suggests that the wrapper will be applied only when the env class name contains the string 'TimeLimit'. Could you explain why? Thank you!