PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.53k
stars
832
forks
source link
Updates: Support the latest Atari environment and state entropy maximization-based exploration. #296
Update for supporting the latest Atari environment
Tested using the following dependences:
stable-baselines3==1.5.0
gym==0.21.0
ale-py==0.7.4
Update for supporting state entropy maximization-based exploration
Intrinsic rewards can improve the exploration when handling complex environments with high-dimensional observations. Thus I added the following module entitled "State entropy maximization with random encoders for efficient exploration (RE3)". Since RE3 requires no auxiliary models, it won't decrease the computational efficiency. Use --use--sem to invoke it!
Update for supporting the latest Atari environment Tested using the following dependences: stable-baselines3==1.5.0 gym==0.21.0 ale-py==0.7.4
Update for supporting state entropy maximization-based exploration Intrinsic rewards can improve the exploration when handling complex environments with high-dimensional observations. Thus I added the following module entitled "State entropy maximization with random encoders for efficient exploration (RE3)". Since RE3 requires no auxiliary models, it won't decrease the computational efficiency. Use --use--sem to invoke it!