PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Hello,
I should have written this issue when we noticed it a while ago but the architecture of the CNN does not match the Nature CNN one (I assume that was the goal), the last layer should have 64 channels too. This repo: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/f60ac80147d7fcd3aa7e9210e37d5734d9b6f4cd/a2c_ppo_acktr/model.py#L176-L180
SB2 repo (following OpenAI Baselines repo): https://github.com/hill-a/stable-baselines/blob/a4efff01ca678bcceee3eb21801c410612df209f/stable_baselines/common/policies.py#L16-L29
or in the SB3 repo: https://github.com/DLR-RM/stable-baselines3/blob/88e1be9ff5e04b7688efa44951f845b7daf5717f/stable_baselines3/common/torch_layers.py#L76-L84