mhamilton723 / STEGO

Unsupervised Semantic Segmentation by Distilling Feature Correspondences
MIT License
724 stars 147 forks source link

loss explanation #20

Closed george66s closed 2 years ago

george66s commented 2 years ago

Hi @mhamilton723, congratulations for the great progress.

Just a question about the loss. In your paper, the loss is like following:

L = λ self L corr (x, x, b self ) + λ knn L corr (x, x knn , b knn ) + λ rand L corr (x, x rand , b rand )

In you code however, there is a linear_loss and a cluster_loss additionally. Could you explain how those two losses work? Did you do any ablation study on those two losses? i.e. If the loss excludes those two losses, what would the result be?

Lots of thanks!

george66s commented 2 years ago

Sorry, seems I didn't read the paper good enough. The linear_loss and the cluster_loss are trained for evaluation purpose. Thanks!

mhamilton723 commented 2 years ago

@george66s yes you are right. The linear loss and cluster loss are the probes trained during the initial training phase because its simpler that way and allows us to get a sense for how its doing during training. These probes don't affect the actual model because they are trained with detached tensors though.

Holmes2002 commented 1 year ago

@george66s , My loss linear and cluster don't decrease after one epoch. Is everything fine ? If it is fine, so how can I choose the best model from them ?