mhubii / ppo_libtorch

C++ implementation of Proximal Policy Optimization
69 stars 16 forks source link

where is the formula in c++ file #1

Open fatalfeel opened 4 years ago

fatalfeel commented 4 years ago

https://github.com/Mikoto10032/DeepLearning/blob/master/books/%5B%E6%B7%B1%E5%BA%A6%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0%5D%5BHung-yi%20Lee%5D/PPO%20(v3).pdf

in this pdf page 9. formula as this 𝑝𝜃 𝜏 = 𝑝 𝑠1 𝑝𝜃 𝑎1|𝑠1 𝑝 𝑠2|𝑠1, 𝑎1 𝑝𝜃 𝑎2|𝑠2 𝑝 𝑠3|𝑠2, 𝑎2 ⋯

where is the formula in c++ file? which function implement it? or where define it? help me find out

fatalfeel commented 4 years ago

In Bayes network its real calculate the conditional probability (http://dlib.net/bayes_net_ex.cpp.html)

PPO algorithm have this formula ex: 𝑝𝜃(𝑎𝑡|𝑠t) https://github.com/Mikoto10032/DeepLearning/blob/master/books/%5B%E6%B7%B1%E5%BA%A6%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0%5D%5BHung-yi%20Lee%5D/PPO%20(v3).pdf

I can not connect the 𝑝𝜃(𝑎𝑡|𝑠t) to source code... or a lot of summation Y = W x Input + B represent this probability? I am confused with the formula relate to source code. please help solve it

mhubii commented 4 years ago

you wont find this exact formula but only the probability of taking an action here. The logarithmic probability is computed by how far off the action is from the current distribution. I think the formula in this pdf merely shows the properties of a Markov Chain, which is that each action is independent on the previous states, but only depends on the current state. Hope this helps

fatalfeel commented 4 years ago

mhubii thanks so. even in the pytorch layer still can not find the formula ex: 𝑝𝜃(𝑎𝑡|𝑠t) or 𝑝𝜃'(𝑎𝑡|𝑠t) the PPO formul in pytorch just a kind conditional probability. am I right?