Closed peterzcc closed 7 years ago
Something that could be challenging to implement is the Fisher Vector Product. In this section, we need to have a second-order gradient. However, we don't have a symbol that computes the Gradient in MXNet, we may need to perform backward() manually then. Thinking of a way to realize it.
Yes, computing second-order gradient is the tricky part. Looks like that we have to do it manually in mxnet.
One thing I don't understand is that in the rllab's implementation, it seems like they use a different method. However the rllab's conjugate gradient implementation is not very readable and I didn't fully understand it.
Automatically computing Hessian matrix is currently not supported in MXNet. Could we use some simplified version that only needs the first-order differential?
@peterzcc You can refer to the supplementary material of the original TRPO paper. It introduces two different implementation methods. The difference is how to compute the Fisher Vector Product. The first one use the Fisher Information Matrix to compute, which has simple form for some specific distributions. The second one just uses more generic Hessian-vector product. rllab's implementation uses the second one since computing Hessian is done by theano. However, I think in mxnet we have to use the first one.
I will start implementing a TRPO with GAE algorithm today. The conjugate gradient optimization algorithm will first be implemented using python API, which could be optimized later.