Hi, han. I'm really interested in your method and I have met some questions.
When it goes to fine-tune task with other datasets, the loss gets lower, but the mae or mse holds.
I used the kl_loss from your code as the loss function
Hi Sorry I missed this issue before you close it.
Any update?
Perhaps it is an implementation problem in MAE/MSE (e.g. one could accidentally feed random labels to MAE/MSE computing functions)
Hi, han. I'm really interested in your method and I have met some questions. When it goes to fine-tune task with other datasets, the loss gets lower, but the mae or mse holds. I used the kl_loss from your code as the loss function