Closed gudovskiy closed 5 years ago
The Hessian-vector product is computed using PyTorch's autodiff feature. The torch.autograd.grad
function returns these values for each parameter group in the module. After this, I concatenate them into a single vector.
I am not sure why you would need to separately compute only one layer's Hessian at a time. You already have to do backprop from the later layers back to your current layer to get it, so you would just be increasing the amount of work you do to be quadratic in the number of layers.