Closed umairjavaid closed 3 years ago
Hi, @umairjavaid
Thanks for your implementation, before I merge it into the main branch, could you take a look at this PR, it seems that you are doing the same thing.
(This is a great paper, and I feel really happy that I am contributing to it ) I haven't seen the whole implementation, but the part where feature-multiplied images are being inputted to the model is implemented using a for loop which is way slower than using a data loader. I also changed the way activations are being normalized, the entire batch of features can be normalized at once in my code using proper pytorch implementation, kindly see the difference.
Cool, if possible, could you also add a Colab demo for Score-CAM? It will be helpful.
I can't provide the demo. You can use the code I provided
Hi @umairjavaid, could you write a test file just like this?
I tried to use this scorecam-batchwise and faced a couple of problems. First there is a typo on line 76. It should be logit = self.model_arch(imgs).cuda()
, not logit = self.model_arch(inputs).cuda()
and when calling backward() in the forward() of scorecam-batchwise.py, you cannot do it for other than scalars (i.e. a single data sample, not a batch) so this error arises:
RuntimeError: grad can be implicitly created only for scalar outputs
I'm not sure how to use this, so a test file would be necessary here.
class myModel15(nn.Module):
def __init__(self, features, num_classes=1000, **kwargs):
super(myModel15, self).__init__()
self.features = features
self.conv6 = nn.Conv2d(512, 1024, kernel_size=3, padding=1)
self.conv7 = nn.Conv2d(1024, num_classes, kernel_size=1)
self.conv8 = nn.Conv2d(512, 1024, kernel_size=3, padding=1)
self.conv9 = nn.Conv2d(1024, num_classes, kernel_size=1)
self.conv10 = nn.Conv2d(512, 1024, kernel_size=3, padding=1)
self.conv11 = nn.Conv2d(1024, num_classes, kernel_size=1)
self.relu = nn.ReLU(inplace=False)
self.avgpool = nn.AdaptiveAvgPool2d(1)
#self.fc = nn.Linear(1024, num_classes)
initialize_weights(self.modules(), init_mode='he')
def get_masked_imgs(self, imgs, activations):
b, d, r, c = imgs.shape
_, A, _, _ = activations.shape
imgs = imgs.reshape(-1)
imgs = imgs.repeat(A)
activations = activations.permute(1,0,2,3)
activations = activations.repeat(1,1,d,1)
activations = activations.reshape(-1)
mul = activations*imgs
mul = mul.reshape(-1,d,r,c)
return mul
def activation_wise_normalization(self, activations):
b,f,h,w = activations.shape
activations = activations.view(-1,h*w)
max_ = activations.max(dim=1)[0]
min_ = activations.min(dim=1)[0]
check = ~max_.eq(min_)
max_ = max_[check]
min_ = min_[check]
activations = activations[check,:]
sub_ = max_ - min_
sub_1 = activations - min_[:,None]
norm = sub_1 / sub_[:,None]
norm = norm.view(b,-1,h,w)
return norm
def get_scores(self, imgs, targets):
b, _, _, _ = imgs.shape
total_scores = []
class MyDataloader(torch.utils.data.Dataset):
def __init__(self, images):
self.images = images
def __len__(self):
return self.images.shape[0]
def __getitem__(self, idx):
return self.images[idx, :, :, :]
train_data = MyDataloader(imgs)
train_loader = torch.utils.data.DataLoader(train_data,
shuffle=False,
num_workers=0,
batch_size=50)
for batch_images in train_loader:
scores = self.sub_forward(batch_images)
scores = F.softmax(scores, dim=1)
labels = targets.long()
scores = scores[:,labels]
total_scores.append(scores)
total_scores = torch.cat(total_scores,dim=0)
total_scores = total_scores.view(-1)
return total_scores
def get_cam(self, activations, scores):
b,f,h,w = activations.shape
cam = activations*scores[None,:,None,None]
cam = cam.sum(1, keepdim=True)
return cam
def sub_forward(self, x):
x1 = self.features(x)
x1 = self.conv6(x1)
x1 = self.relu(x1)
x1 = self.conv7(x1)
x1 = self.relu(x1)
x2 = self.features(x)
x2 = self.conv8(x2)
x2 = self.relu(x2)
x2 = self.conv9(x2)
x2 = self.relu(x2)
x3 = self.features(x)
x3 = self.conv10(x3)
x3 = self.relu(x3)
x3 = self.conv11(x3)
x3 = self.relu(x3)
x = x1 + x2 + x3
x = self.avgpool(x)
x = x.view(x.size(0), -1)
return x
def forward(self, imgs, labels=None, return_cam=False):
x = self.sub_forward(imgs)
if(return_cam == True):
with torch.no_grad():
batch_size, D, H, W = imgs.shape
y = self.features(imgs)
y = F.relu(y)
y = F.interpolate(y, (H, W), mode='bilinear', align_corners=False)
y = self.activation_wise_normalization(y)
z = self.get_masked_imgs(imgs, y)
z = self.get_scores(z, labels)
y = self.get_cam(y,z)
y = F.relu(y)
y = normalize_tensor(y)
y = y.squeeze_(0).detach().clone()
return y
return {'logits': x}
This is how I implemented it. I am taking activation maps from within my model - by returning them when return_cam = True
. The important thing that I am trying to share is how I have implemented self.activation_wise_normalization
, self.get_masked_imgs
, self.get_scores
and self.get_cam
. These work in my code. I did not run your code, kindly alter these in your code accordingly
Will you consider my code? It took a lot of effort to get it done. Thank you :)
Hi @umairjavaid, it's a good implementation. But I don't have time to clean the code so that it fits the current coding style.
This implementation can generate scorecam activation maps for multiple images present in a batch. This code is 25x faster than the original implementation as it uses a data loader to input images (multiplied with the normalized activation maps) instead of a for loop