Closed magus96 closed 5 years ago
You can directly backpropagate gradient through operations in ode solver. However this is pretty computationally expensive and prone to accumulating numerical error, also its memory consumption scales linearly with "time" between observations. Instead, you can solve another ode (the one with da/dt), which is called adjoint, backwards in time for calculating backpropagated gradient. This is the main feature of the approach. Hope I understood your question correctly.
Yes. I read this as a fine point in your notebook later. I think this should be highlighted more. I finally understand this paper. Thanks a lot!
Something I haven't been able to get my head around is the need to find d(a)/dt. Isn't the gradient of the loss function enough for backpropagation? Sorry if it's a trivial doubt.