autograd/tf.gradients - is there support?

qwer1304 commented 5 years ago

Hello, I'm trying to implement in MatConvLab/autonn this network implemented here in PyTorch and here in Tenserflow. I need to define a network that uses gradients on-the-fly to calculate some other gradients and updates. Note that access to 2nd order derivative (of loss) is needed at construction time, since

Loss(S; Θ) = F(S_bar; Θ) + β * G(S_breve; Θ') = F(S_bar; Θ) + β * G(S_breve; Θ - α * ∂F(S_bar; Θ)/∂Θ) ∂Loss(S; Θ)/∂Θ = ∂F(S_bar; Θ)/∂Θ + β * ∂G(S_breve; Θ')/∂Θ ∂G(Θ')/∂Θ = ∂G(Θ')/∂Θ' * ∂Θ'/∂Θ, and ∂Θ'/∂Θ = 1 - α * ∂²F(Θ)/∂Θ² ∂Loss(S; Θ)/∂Θ = ∂F(S_bar; Θ)/∂Θ + β * ∂G(S_breve; Θ')/∂Θ' - α*β * ∂G(S_breve; Θ')/∂Θ' * ∂²F(Θ)/∂Θ²

Calculating ∂F(S_bar; Θ)/∂Θ and ∂G(S_breve; Θ')/∂Θ' is simple: just run F(S_bar; Θ) and G(S_breve; Θ') backwards. The problem is how to obtain ∂²F(Θ)/∂Θ²?

In PyTorch, the crucial step is implemented here using autograd.grad during network assembly (before compile!), s.t. proper differentiation occurs during backpropagation. Here is a Tenserflow implementation using tf.gradients.

Can this be done in autonn and if so - how? Thx

PS From what I understand getDer and setDer are run-time methods that provide access to numeric value of a derivative. I need to use 2nd order derivative in network construction, so access to 1st order derivative at build-time is needed.

jotaf98 commented 5 years ago

Hi, I wish there was support for 2nd order derivatives, but unfortunately there isn't at the time.

Essentially, the gradients of the operations need to be written as compositions of differentiable operations too. This is already the case for all math operators, and the layers that are defined in pure Matlab like ReLUs/sigmoids/etc.

However, monolithic layers like convolution or batch-norm are made up of custom C++/CUDA code and do not enjoy the benefits of automatic differentiation for free. I think it's possible to define their 2nd (and Nth) order derivatives without writing custom CUDA code but I didn't follow up on it.

If you only care about MLPs and not CNNs that is not a concern and implementing 2nd order derivatives will be easier, but I don't have any ready-made code for it unfortunately.

qwer1304 commented 5 years ago

Thx for the response. Practically, I'd like to try a network built of LSTM, FC, softmax and later Relu and maybe dropout. How to approach getting 2nd order derivative of such a network within autonn? Thx PS As I understand it, currently there's NO gradient operator which for a given network would return another network which is the gradient of the first, even with some structural limitations imposed on it (e.g., no CNN, etc).

jotaf98 commented 5 years ago

Exactly, currently there is no such operator.

The way it would work in practice is to walk through the layers in backward order (this list can be obtained by running layer.find() with no arguments and reversing the list), and call autonn_der with each layer's forward function to get the corresponding backward function.

It would then call each backward function with Layer objects as inputs, so instead of computing a gradient immediately, autonn_der would give back the list of Layers that implements its contents.

After doing this for all layers in the list you have the composition of Layer objects that expresses the gradient computation, and you can backpropagate through it -- this would be the double gradient. The process can be iterated for Nth order gradients.

This is a fun coding challenge, if you have time to do it step by step (e.g. starting with simple nets and only one or two layers, and building from there). I wish I had time to implement it myself, but I'm happy to assist if someone wants to give it a try.

vlfeat / autonn

autograd/tf.gradients - is there support? #46