Closed pdudero closed 6 years ago
FYI, some links to demonstrate the problem: https://discussions.udacity.com/t/difficulty-in-understanding-bptt/644784
...the quantity "W" is used before it is defined. Eq. 51 would seem to indicate that W is simply W1+W2+W3. But then when applying Eq. 52 to Eq. 51, you get either that y is a constant with respect to W, or else 1 = 3. So I can see no motivation at all for Eq. 51. There seems to be no connection between the mystical "accumulation of contributions" with the chain rule, from which it arises.
Thank you for the comment. We will review and update the content accordingly.
Greetings, Another student and I had difficulty understanding the "Backpropagation Through Time" equations in the RNN lessons. The problem is that the derivation depends on a particular form of the chain rule that is not often used, one for multiple dependent variables. You have a couple places in your video content where you review the chain rule, but the review only involves a single independent variable. It would be very, very helpful to update the content, however you want to do it, to make explicit the fact that the BPTT equation derivations are completely motivated by the multi-dependent-variable form of the chain rule. Currently the content only refers obliquely to "contributions" and "accumulative gradients". Thanks.