Open ericyq opened 7 months ago
During the training process, there is indeed a significant gap between recognition loss and analysis losses. In our implementation, we set the weight of each loss to 1. Since different tasks are computed in separate subnets, the performance on a certain task is not significantly affected by the scale of the according loss. It is not necessary to rescale the losses of all tasks to the same scale.
how to handle the loss difference ?