zackchase / mxnet-the-straight-dope

An interactive book on deep learning. Much easy, so MXNet. Wow. [Straight Dope is growing up] ---> Much of this content has been incorporated into the new Dive into Deep Learning Book available at https://d2l.ai/.
https://d2l.ai/
Apache License 2.0
2.56k stars 727 forks source link

Documentation improvement for Automatic Differentiation #77

Open orchidmajumder opened 7 years ago

orchidmajumder commented 7 years ago

In the documentation section for Head gradient and the chain rule, I think it might be better to explain the context behind head gradient in a bit more detailed way. Like if we refer to the class-notes for CS231N, it explains back-prop with a notion of incoming gradient (gradient on its output) and local gradient in the Intuitive understanding of backpropagation section. If I am correct, the incoming gradient is what is referred as head gradient and I believe if we add that explanation in the documentation, it might be more intuitive to the readers.

Please let me know if my understanding is correct, I will update the documentation and raise a pull request.

zackchase commented 7 years ago

This explanation definitely needs to be clear. Thanks for pointing it out. I would love for you to take a stab at improving the explanation, and I can always make a pass afterwards and revise the text

orchidmajumder commented 7 years ago

Sure thanks a lot. I'll do that.

zackchase commented 7 years ago

Still interested in taking a swing here or should I do it?

orchidmajumder commented 7 years ago

Apologies for not updating on this. Let me try to do it over the weekend and if I could not manage time to finish it, you can take over.

orchidmajumder commented 7 years ago

Here is a github-gist link with my first attempt to improve it: https://gist.github.com/orchidmajumder/68fc965cb3e38f8b0daa7fec96285b63

Please let me know if the approach looks fine, I'd then raise a PR to incorporate any minor feedback that we might think relevant.

orchidmajumder commented 7 years ago

@zackchase can you please take a look at it? Or if you can suggest someone who can take a look?

jshtok commented 6 years ago

@orchidmajumder Hi, I have read your improved version of the page. The section on "Head gradients and the chain rule" looks much more elaborate, but the code example demonstrating the use of head_gradient has remained untouched. I believe there is a confusion in notation, between the text describing the chain rule (which in itself is consistent) and the code in block [7]. In particular, the function y(x) in the text is the function z(x) in the code (the internal function y(x) in the code is a misfortunate complication here, to my opinion), and the head gradient that z.backward() admits here is not dz/dx (this is the internal gradient), but some dg/dz, the gradient that is passed back to z from a later stage. Please re-read the last part (starting from ".. sometimes when we call the backward ..." ) and see that it is compliant with the following block [7].