timemmert / differential-ml

Tensorflow and Pytorch implementation of differential machine learning (https://arxiv.org/abs/2005.02347, by Brian Huge and Antoine Savine).
MIT License
6 stars 1 forks source link

Speed-up Automatic Derivative of Activation #11

Closed YannickLimmer closed 2 years ago

YannickLimmer commented 2 years ago

Currently, as seen here, DmlLInear relies on torch.autograd.jacobian.

Whilst we are only interested in the values on the diagonal, the entire matrix is computed. This is timeous and may result in a speed-up if solved more efficiently.

YannickLimmer commented 2 years ago

This appears to be a well known problem, when considered in full generality.

YannickLimmer commented 2 years ago

I will close this issue again, we are currently operating on the fastest (general) way to differentiate here. This is due to the fact that activations such as softmax do not operate element-wise.

It is better to concentrate on automatically differentiating through the entire model instead of differentiating in each step. (A natural followup of #2)