Why the hessian vector product is calculated by mini-batch?

kohpangwei / influence-release

MIT License

776 stars 175 forks source link

Why the hessian vector product is calculated by mini-batch? #15

Closed tengerye closed 3 years ago

tengerye commented 5 years ago

Hi, I found the hessian vector product in the file genericNeuralNet.py is implemented with mini-batch (https://github.com/kohpangwei/influence-release/blob/578bc458b4d7cc39ed7343b9b271a04b60c782b1/influence/genericNeuralNet.py#L529-L537). Why don't we directly compute the accurate hessian vector?

markus-beuckelmann commented 5 years ago

You can compute the accurate hessian vector product, but it turns out that it's just not necessary and would slow down the computation drastically. A mini-batch sample of the HVP within each iteration is sufficient to get good results. I encourage you to do some experiments with varying mini-batch sizes, let us know what you find out!

tengerye commented 5 years ago

@markus-beuckelmann Thank you for your kind reply. I am confused about why is mini-batch way faster? If we compute the whole hessian, the loss costs O(n), the hessian costs O(p^2), the hvp costs O(p^2). The code divides the data set into k groups, the lost costs O(n/k), the other stay the same for an iteration. I don't think it is even faster.

kohpangwei commented 3 years ago

In this case, IIRC, it's just to make sure it fits within the GPU memory.