Closed YannickLimmer closed 2 years ago
This appears to be a well known problem, when considered in full generality.
I will close this issue again, we are currently operating on the fastest (general) way to differentiate here. This is due to the fact that activations such as softmax
do not operate element-wise.
It is better to concentrate on automatically differentiating through the entire model instead of differentiating in each step. (A natural followup of #2)
Currently, as seen here,
DmlLInear
relies ontorch.autograd.jacobian
.Whilst we are only interested in the values on the diagonal, the entire matrix is computed. This is timeous and may result in a speed-up if solved more efficiently.