Closed george66s closed 2 years ago
Sorry, seems I didn't read the paper good enough. The linear_loss and the cluster_loss are trained for evaluation purpose. Thanks!
@george66s yes you are right. The linear loss and cluster loss are the probes trained during the initial training phase because its simpler that way and allows us to get a sense for how its doing during training. These probes don't affect the actual model because they are trained with detached tensors though.
@george66s , My loss linear and cluster don't decrease after one epoch. Is everything fine ? If it is fine, so how can I choose the best model from them ?
Hi @mhamilton723, congratulations for the great progress.
Just a question about the loss. In your paper, the loss is like following:
L = λ self L corr (x, x, b self ) + λ knn L corr (x, x knn , b knn ) + λ rand L corr (x, x rand , b rand )
In you code however, there is a linear_loss and a cluster_loss additionally. Could you explain how those two losses work? Did you do any ablation study on those two losses? i.e. If the loss excludes those two losses, what would the result be?
Lots of thanks!