haofanwang / Score-CAM

Official implementation of Score-CAM in PyTorch
MIT License
399 stars 66 forks source link

batchwise implementation of scorecam #10

Closed umairjavaid closed 3 years ago

umairjavaid commented 3 years ago

This implementation can generate scorecam activation maps for multiple images present in a batch. This code is 25x faster than the original implementation as it uses a data loader to input images (multiplied with the normalized activation maps) instead of a for loop

haofanwang commented 3 years ago

Hi, @umairjavaid

Thanks for your implementation, before I merge it into the main branch, could you take a look at this PR, it seems that you are doing the same thing.

umairjavaid commented 3 years ago

(This is a great paper, and I feel really happy that I am contributing to it ) I haven't seen the whole implementation, but the part where feature-multiplied images are being inputted to the model is implemented using a for loop which is way slower than using a data loader. I also changed the way activations are being normalized, the entire batch of features can be normalized at once in my code using proper pytorch implementation, kindly see the difference.

haofanwang commented 3 years ago

Cool, if possible, could you also add a Colab demo for Score-CAM? It will be helpful.

umairjavaid commented 3 years ago

I can't provide the demo. You can use the code I provided

haofanwang commented 3 years ago

Hi @umairjavaid, could you write a test file just like this?

Tgaaly commented 3 years ago

I tried to use this scorecam-batchwise and faced a couple of problems. First there is a typo on line 76. It should be logit = self.model_arch(imgs).cuda(), not logit = self.model_arch(inputs).cuda()

and when calling backward() in the forward() of scorecam-batchwise.py, you cannot do it for other than scalars (i.e. a single data sample, not a batch) so this error arises: RuntimeError: grad can be implicitly created only for scalar outputs

I'm not sure how to use this, so a test file would be necessary here.

umairjavaid commented 3 years ago
class myModel15(nn.Module):
    def __init__(self, features, num_classes=1000, **kwargs):
        super(myModel15, self).__init__()
        self.features = features
        self.conv6 = nn.Conv2d(512,  1024, kernel_size=3, padding=1) 
        self.conv7 = nn.Conv2d(1024, num_classes, kernel_size=1)
        self.conv8 = nn.Conv2d(512,  1024, kernel_size=3, padding=1) 
        self.conv9 = nn.Conv2d(1024, num_classes, kernel_size=1)
        self.conv10 = nn.Conv2d(512,  1024, kernel_size=3, padding=1) 
        self.conv11 = nn.Conv2d(1024, num_classes, kernel_size=1)
        self.relu = nn.ReLU(inplace=False)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        #self.fc = nn.Linear(1024, num_classes)
        initialize_weights(self.modules(), init_mode='he')

    def get_masked_imgs(self, imgs, activations):
      b, d, r, c = imgs.shape
      _, A, _, _ = activations.shape
      imgs = imgs.reshape(-1)
      imgs = imgs.repeat(A)
      activations = activations.permute(1,0,2,3)
      activations = activations.repeat(1,1,d,1)
      activations = activations.reshape(-1)
      mul = activations*imgs
      mul = mul.reshape(-1,d,r,c)
      return mul

    def activation_wise_normalization(self, activations):
      b,f,h,w = activations.shape
      activations = activations.view(-1,h*w)
      max_ = activations.max(dim=1)[0]
      min_ = activations.min(dim=1)[0]
      check = ~max_.eq(min_)
      max_ = max_[check]
      min_ = min_[check]
      activations = activations[check,:]
      sub_ =  max_ - min_
      sub_1 = activations - min_[:,None]
      norm = sub_1 / sub_[:,None]
      norm = norm.view(b,-1,h,w)
      return norm  

    def get_scores(self, imgs, targets):
      b, _, _, _ = imgs.shape
      total_scores = []
      class MyDataloader(torch.utils.data.Dataset):
        def __init__(self, images):
            self.images = images
        def __len__(self):
            return self.images.shape[0]
        def __getitem__(self, idx):
            return self.images[idx, :, :, :]

      train_data = MyDataloader(imgs)
      train_loader = torch.utils.data.DataLoader(train_data,
                                                shuffle=False,
                                                num_workers=0,
                                                batch_size=50)
      for batch_images in train_loader:
        scores = self.sub_forward(batch_images)
        scores = F.softmax(scores, dim=1)
        labels = targets.long()
        scores = scores[:,labels]
        total_scores.append(scores)
      total_scores = torch.cat(total_scores,dim=0)
      total_scores = total_scores.view(-1)
      return total_scores

    def get_cam(self, activations, scores):
      b,f,h,w = activations.shape
      cam = activations*scores[None,:,None,None]
      cam = cam.sum(1, keepdim=True)
      return cam

    def sub_forward(self, x):
      x1 = self.features(x)
      x1 = self.conv6(x1)
      x1 = self.relu(x1)
      x1 = self.conv7(x1)
      x1 = self.relu(x1)

      x2 = self.features(x)
      x2 = self.conv8(x2)
      x2 = self.relu(x2)
      x2 = self.conv9(x2)
      x2 = self.relu(x2)

      x3 = self.features(x)
      x3 = self.conv10(x3)
      x3 = self.relu(x3)
      x3 = self.conv11(x3)
      x3 = self.relu(x3)

      x = x1 + x2 + x3
      x = self.avgpool(x)
      x = x.view(x.size(0), -1) 
      return x

    def forward(self, imgs, labels=None, return_cam=False):
        x = self.sub_forward(imgs)

        if(return_cam == True):
          with torch.no_grad():
            batch_size, D, H, W = imgs.shape
            y = self.features(imgs)
            y = F.relu(y)
            y = F.interpolate(y, (H, W), mode='bilinear', align_corners=False)
            y = self.activation_wise_normalization(y)
            z = self.get_masked_imgs(imgs, y)
            z = self.get_scores(z, labels)
            y = self.get_cam(y,z)
            y = F.relu(y)
            y = normalize_tensor(y)
            y = y.squeeze_(0).detach().clone()
            return y

        return {'logits': x}
umairjavaid commented 3 years ago

This is how I implemented it. I am taking activation maps from within my model - by returning them when return_cam = True. The important thing that I am trying to share is how I have implemented self.activation_wise_normalization, self.get_masked_imgs, self.get_scores and self.get_cam. These work in my code. I did not run your code, kindly alter these in your code accordingly

umairjavaid commented 3 years ago

Will you consider my code? It took a lot of effort to get it done. Thank you :)

haofanwang commented 3 years ago

Hi @umairjavaid, it's a good implementation. But I don't have time to clean the code so that it fits the current coding style.