ikostrikov / pytorch-trpo

PyTorch implementation of Trust Region Policy Optimization
MIT License
433 stars 91 forks source link

compute the Fisher-Vector Producy #8

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hello, I wanna ask that in line 67 in your trpo.py, you will get two terms, and in the TRPO paper, he said the second term vanishes ?, and you add v*damping, I guess its function is to make sure the positive definiteness? , could you explain it in detail? thank you very much! and in your line 117 in your main.py, could you explain why this can approximate the average KL in detail? thank you very much!

jastfkjg commented 6 years ago

Hello, have you found the answer? I have the same question for v*damping and the get_kl() function. Thank you!