Open howardyclo opened 5 years ago
(Figure credit: Zhou et al.)
Loss overview:
Co-peak loss:
Affinity loss:
(Note that this equation is corrected by the author, different from the original paper.)
In the first term, if p and q are saliency pixels, the product of their saliency values (\hat{S_n(p)} \times \hat{Sm(q)}) should be high. Therefore, the high affinity between p and q should be enforced, and thus we want to make T{nm}(p,q) higher. However, this is a minimization problem, so we minimize (1-T_{nm}(p,q)) instead.
In the second term, if p is a saliency pixel and q is a background pixel, the difference of their saliency value (\hat{S_n(p)} - \hat{Sm(q)}) should be high. Therefore, the low affinity between p and q should be enforced, and thus we want to make T{nm}(p,q) lower.
The proposed affinity loss generalizes eq (4) to consider both inter-image and intra-image affinities:
Saliency loss (follows Hsu et al.):
How is the result computed given 3 images? After optimizing Eq. (1), we simply use the detected peaks on the estimated co-saliency maps as the final co-peaks, because detecting the co-peaks on all possible image pairs is complicated.
How are the (local) peaks sampled from the predicted co-saliency map for an image? They use the 3 x 3 local window to sample peaks on the co-saliency map and then with the sampled peaks, the peak back-propagation proposed in PRM is adopted.
Metadata