Gradient calculation with Chainrule

slimgroup / InvertibleNetworks.jl

A Julia framework for invertible neural networks

MIT License

152 stars 23 forks source link

Gradient calculation with Chainrule #48

Closed yl4070 closed 2 years ago

yl4070 commented 2 years ago

Hi, I watched on the presentation on JuliaCon, this package can be used with chainrule. However, I noticed the gradient is not calculated w.r.t parameters: the code in the slides is g = gradient(X->loss(X), X). Is this a typo or designed to work this way?

The presentation slides I was refer to: https://youtu.be/9M-zEGHY4i4?t=1023

mloubout commented 2 years ago

Yes the gradient is taken w.r.t to the input. All parameters in the nework will have their gradients computed for during the backpropagation. So basically g contains d loss(X)/ dX and each layer will have their parameter contain their gradients so that you can update them with the standard Flux/Zygote algorithms.

yl4070 commented 2 years ago

Thanks for replying. For the Flux layer, do I still need to do another round of gradient calculation w.r.t params? Is fluxblock an alternative to avoid another round of calculation for the example presented in the slides?

mloubout commented 2 years ago

The Flux cock layer is implemented in a similar way than out invertible layers and you shouldn't need and additional gradient calculation.