In recent editions I have presented the idea of gradient estimation already in M0, where we can start easy and compute gradient estimates on mini-batches.
Note: I find our font size far too big, it becomes inconvenient when we need to break a derivation into a few steps and want to leave a comment on the slide (for those checking the slides after the lecture).
In recent editions I have presented the idea of gradient estimation already in M0, where we can start easy and compute gradient estimates on mini-batches.
Note: I find our font size far too big, it becomes inconvenient when we need to break a derivation into a few steps and want to leave a comment on the slide (for those checking the slides after the lecture).