Question about Boundary-guided Mid-level Saliency Drift Regularization

Thank you for releasing the code of this excellent work. Chapter 3.2 (Boundary-guided Mid-level Saliency Drift Regularization) of the paper introduces: "the mid-level saliency maps of our model are generated using GradCAM [42] at three stages of the CNN backbone", but we only see the implementation of gradcamnet with one convolution layer in the code (shown below). I would like to ask if there are any special considerations here? The saliency semantics generated by GradCAM seem to have no gradient. I am a little confused about how the model uses saliency map S(x,j) to construct the loss of L^{dbs}{t}(x). I look forward to receiving your response.

ResNet.py

line 186: self.gradcam_net = nn.ModuleList([nn.Conv2d(in_c, 1, 3, padding=1) for in_c in [128,256,512]]) ... ... ... line 259: for i in range(len(intermediate_x)): line 260: intermediate_x[i] = self.gradcam_neti

scok30 / tass

Question about Boundary-guided Mid-level Saliency Drift Regularization #1