peterzcc / Arena

0 stars 1 forks source link

[New Feature] Supporting R-Operator in MXNet for automatic computation of the Hessian-Vector product and the Hessian matrix #19

Closed sxjscience closed 6 years ago

sxjscience commented 7 years ago

@peterzcc @flyers @WellyZhang @leezu The key to the auto-computation of the Hessian-Vector product and the Hessian matrix is the R-Operator (Right product of Jacobian matrix and the input vector). We can refer to the paper "Fast Exact Multiplication by the Hessian" and the Theano's document of ROp for more details.

We can add another R_forward and R_backward functions in MXNet to compute the R(output) and the R(gradient) over some parameter w. The workflow will be like this. Like the traditional forward-backward computation of the gradient, we first compute net.forward() and net.backward() to get the output/gradient and then use net.R_forward(param, v) and net.R_backward(param, v) to get the Hessian-vector product.

To compute the Hessian matrix, we can call the subprogram of computing the Hessian-vector product multiple times. We can refer to Wiki and Theano's implementation of Hessian.

This work requires some understanding of MXNet's inner dependency engine. Comment below if you are interested.

sxjscience commented 7 years ago

This requires lots of work. So we need to first investigate whether it's necessary to compute the Hessian-Vector product. As suggested in https://github.com/peterzcc/Arena/issues/18, we seems to be able to implement TRPO by the empirical fisher matrix.