Open tmquan opened 1 year ago
predicting the image is a lot harder for the model, don't expect similar results!
The model in diffusers expects a timestep as the second argument but since we're training from scratch we can choose to ignore it by always passing 0 as the timestep. In the text, I call out how to change this if you want to add timestep conditioning: We can replicate the training shown above using this model in place of our original one. We need to pass both x and timestep to the model (here I always pass t=0 to show that it works without this timestep conditioning and to keep the sampling code easy, but you can also try feeding in (amount*1000) to get a timestep equivalent from the corruption amount).
To change what the network is predicting (the 'target') this is the relevant line:
loss = loss_fn(pred, x) # How close is the output to the true 'clean' x?
Here we compare the output of the network (pred) with the clean image. If you want to predict the noise, you might use loss_fn(pred, noise)
(you will also then have to change the sampling method).
Thanks for the notebooks.
I have one comment In this file https://github.com/huggingface/diffusion-models-class/blob/main/unit1/02_diffusion_models_from_scratch.ipynb
In case I want to make the network predict the clean images, should the pred formula be changed to this by attaching noise_amount?