Closed XueruiSu closed 1 year ago
hhhhh, the 2 is for "b", batch size, not for the loss. Since the reduction parameter in F.mse_loss sets None, then the loss would be huge if we only use .sum()..... It is difficult for training and I prefer to use .sum() than .mean(). b 2 is used to control the scale of loss.
thanks for reply
F.mseloss can do the square operation, so may we not need to do the "**2." operation? the position of the code is in the file named TrainCondition.py on the directory of DiffusionFreeGuidence