shareeff / PPO

Tensorflow implementation of proximal policy optimization (PPO) algorithm
13 stars 1 forks source link

What is the advantage function that you have used ? #1

Closed shamanez closed 6 years ago

shareeff commented 6 years ago

advantage function: self.adv = self.y - self.value link 1: https://github.com/shareeff/PPO/blob/5aff8ce024b4ad92774244eda3e4cba7e603de01/ppo.py#L51 link 2: https://github.com/shareeff/PPO/blob/5aff8ce024b4ad92774244eda3e4cba7e603de01/worker.py#L110 link 3: https://github.com/shareeff/PPO/blob/5aff8ce024b4ad92774244eda3e4cba7e603de01/worker.py#L114

self.y is the discounted reward. link 1: https://github.com/shareeff/PPO/blob/5aff8ce024b4ad92774244eda3e4cba7e603de01/worker.py#L117 link 2: https://github.com/shareeff/PPO/blob/5aff8ce024b4ad92774244eda3e4cba7e603de01/worker.py#L85

Similar type of implementation you can find from here. link: https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/4abdde193dad3ae321e792f1fa5fb91a40b93b78/contents/12_Proximal_Policy_Optimization/simply_PPO.py#L46

shamanez commented 6 years ago

Is there any reason that why you haven't use Generalized Advantage Estimate Function (GAES) ?

shareeff commented 6 years ago

I have implemented this project for leaning purpose. That's why I have tried to keep it simple. Thanks