haofanwang / Score-CAM

Official implementation of Score-CAM in PyTorch
MIT License
399 stars 66 forks source link

Coefficients of activation maps #20

Closed nicolebussola closed 2 years ago

nicolebussola commented 2 years ago

Hi, I was looking at the computation of coefficients for the activation maps:

              # how much increase if keeping the highlighted region
              # predication on masked input
              output = self.model_arch(input * norm_saliency_map)
              output = F.softmax(output)
              score = output[0][predicted_class]

              score_saliency_map +=  score * saliency_map

In the paper (and in the comment), you refer to the Increase of confidence, so the score should be computed as the difference between the score of the raw input and the score of the masked input. However, looking at this implementation we understand that the score is just the one predicted on the masked input. Am I missing something?

Thank you Nicole

haofanwang commented 2 years ago

@nicolebussola

Hi, Nicole, thanks for your interest.

As we mentioned in Section 4 of the paper, we set the baseline input (X_b) to zero for simplicity. In our implementation, we directly set f(X_b) to zero, as it is a constant no matter what kind of X_b is chosen. The difference is minor in our observation. Hope this helps.

nicolebussola commented 2 years ago

Dear Haofan, thank you for your quick reply.

We find your method very reliable and robust, but we have some questions that we have been working on for some time and we think you could help us. Let me briefly explain our problem: we have a classifier trained on a binary problem that actually classifies the presence or absence of a certain feature. As an example, suppose we have images with bike+person and ones with bikes only, and we are interested in finding images with people in it.

We would really appreciate your opinion,

Best

haofanwang commented 2 years ago

@nicolebussola

If I understand correctly, you have a binary classifier (w/wo person) in your example.

For Q1, the heatmap for "No Person" is possible to be messy.

For Q2, an underlying reason is that your model is not discriminative enough. In ideal case, the result of Score-CAM can work as a coarse detector. I'm not sure why you get two similar maps. I suggest you visualize the activation maps and their linear weights for each class (w/wo).

haofanwang commented 2 years ago

Not active. Closed.

lorenz-gorini commented 2 years ago

@nicolebussola

If I understand correctly, you have a binary classifier (w/wo person) in your example.

For Q1, the heatmap for "No Person" is possible to be messy.

For Q2, an underlying reason is that your model is not discriminative enough. In ideal case, the result of Score-CAM can work as a coarse detector. I'm not sure why you get two similar maps. I suggest you visualize the activation maps and their linear weights for each class (w/wo).

I am @nicolebussola colleague. Thank you very much. Your insights have been very useful. We are looking into that. If we get to some interesting result about this, we will give you an update.