Closed yl4070 closed 2 years ago
Yes the gradient is taken w.r.t to the input. All parameters in the nework will have their gradients computed for during the backpropagation. So basically g
contains d loss(X)/ dX
and each layer will have their parameter contain their gradients so that you can update them with the standard Flux/Zygote algorithms.
Thanks for replying. For the Flux layer, do I still need to do another round of gradient calculation w.r.t params? Is fluxblock
an alternative to avoid another round of calculation for the example presented in the slides?
The Flux cock layer is implemented in a similar way than out invertible layers and you shouldn't need and additional gradient calculation.
Hi, I watched on the presentation on JuliaCon, this package can be used with chainrule. However, I noticed the gradient is not calculated w.r.t parameters: the code in the slides is
g = gradient(X->loss(X), X)
. Is this a typo or designed to work this way?The presentation slides I was refer to: https://youtu.be/9M-zEGHY4i4?t=1023