Open grooviiee opened 1 year ago
We currently understand how PPO algorithm finds best way to make agent works better.
Precisely,we need to know in deep. Firstly, understand VAE algorithm. -> GAE_ GAE는 Advantage를 구해야 하는데, 가중치(weight)를 주어서 bias(편향)를 줄이려고 한다.
Secondly, understand probability distribution. Thirdly, understand KL divergence (which is used to compare similarity)
Plus, most of MAPPO implementation is only one actor-critic sharing environemnts...
We need to modify training algorithm into separated actor critic network environment
Firstly understand mappo algo.