Closed WBS-123 closed 1 year ago
The implementation in our code is according to Eq. (7) in the original paper, which is quite general since it just needs an assumption that each task output follows a Gaussian distribution and it does not need to know what type of loss function for each task. In some cases, you can deduce it again according to the loss functions you use (like Eq. (10) in the original UW paper).
Closed as no further updates.
In the uw original paper, the objective function is: and according to the paper, the second item is diffirent with first item in denominators. But in you code, loss = (losses/(2*self.loss_scale.exp())+self.loss_scale/2).sum(), without distinction between the denominators of these two. Is that correct?