ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.53k stars 832 forks source link

CNN Architecture #280

Open araffin opened 3 years ago

araffin commented 3 years ago

Hello,

I should have written this issue when we noticed it a while ago but the architecture of the CNN does not match the Nature CNN one (I assume that was the goal), the last layer should have 64 channels too. This repo: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/f60ac80147d7fcd3aa7e9210e37d5734d9b6f4cd/a2c_ppo_acktr/model.py#L176-L180

SB2 repo (following OpenAI Baselines repo): https://github.com/hill-a/stable-baselines/blob/a4efff01ca678bcceee3eb21801c410612df209f/stable_baselines/common/policies.py#L16-L29

or in the SB3 repo: https://github.com/DLR-RM/stable-baselines3/blob/88e1be9ff5e04b7688efa44951f845b7daf5717f/stable_baselines3/common/torch_layers.py#L76-L84