ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.56k stars 831 forks source link

init of neural network #230

Open KK666-AI opened 4 years ago

KK666-AI commented 4 years ago

Dear Author,

Thanks for sharing the excellent work on reproducing reinforcement learning algorithms. I notice that you use init_ = lambda m: init(m, nn.init.orthogonal_, lambda x: nn.init.constant_(x, 0)) to initialize neural networks and I find it makes neural networks much more stable. However, I don't understand the underlying theory of this trick, could you have an explanation or give some related papers?

Thanks.

realiti4 commented 4 years ago

Hi, I've also realized, if I don't use this init method, a2c doesn't improve when number of processes are not high enough, and I'm also curious about the reason behind it. I'd also be happy to learn more about this, if someone can explain, thank you.

KarlXing commented 4 years ago

Looking forward to an explanation too!

shtse8 commented 4 years ago

I am looking forward to an explanation as well!