What is the reason for calculating the loss and pass it to the backward function as gradient parameter?

tanelp / tiny-diffusion

A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.

628 stars 52 forks source link

What is the reason for calculating the loss and pass it to the backward function as gradient parameter? #3

Closed Eliyas0007 closed 4 months ago

toannguyen1904 commented 8 months ago

same question

bkkm78 commented 4 months ago

I am also confused about this. The end result is that gradients are scaled according to the magnitude of the loss. Maybe this is some learning rate scheduling wizardry?

tanelp commented 4 months ago

Hmm, looks like this was committed by mistake. Changed in this commit. Thanks!