ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.53k stars 832 forks source link

Does setting the flag "use-proper-time-limits" to be True recommended for all gym environments with time limit? #259

Closed PeixinC closed 3 years ago

PeixinC commented 3 years ago

It seems to me that this flag should be set to True whenever an environment has a max episode time limit. To make this flag effective, the original env should be wrapped with a TimeLimitMask gym wrapper. However, line 47-48 in envs.py suggests that the wrapper will be applied only when the env class name contains the string 'TimeLimit'. Could you explain why? Thank you!

ikostrikov commented 3 years ago

Yes, with all environments with TimeLimit and truncated episodes.