Closed peterzcc closed 6 years ago
@peterzcc @flyers After reading the TRPO paper, I find that they've used the analytical estimator of the fisher information matrix (Section 6, Paragraph1). The traditional estimator (A = G G^T) could also be used and has been tested in the paper, which has similar performance as the analytical estimator ( Figure 4, Empirical FIM V.S Vine).
I feel that the traditional estimator could be better than the analytical estimator. G G^T will naturally be p.s.d while the analytical version will have negative eigenvalues in non-convex case (Hessian will be positive semi-definite for convex functions, but can be not p.s.d for non-convex functions). We can replace the analytical estimator in the program to empirical estimator and test the efficiency and performance.
So I implemented a TRPO algorithm that tries to replicate the author's code with the help of tensorflow anyway. We can run the demo here. I think we can reuse as much existing code as possible, and also to reduce the dependency on any specific deep learning framework. The algorithm now looks good, which could achieve a score of 1000 in Inverted-Pendulum-v1.
Remaining works to be done: