I have a question when I do the ablation study

I want to know whether the label input is important to the model, so I delete the label input in my ablation study. This is my way:

    _x1 = _x1 / _x1.norm(dim=-1, keepdim=True)
    l_fea1_like  = torch.ones_like(l_fea1)       ### this is what I add
    logits_per_image1 = logit_scale1 * _x1 @ l_fea1.t().float() 
    out1 = logits_per_image1.view(imshape[0][0], imshape[0][2], imshape[0][3], -1).permute(0, 3, 1, 2) 
    cam1 = out1.clone().detach()
    cls1 = self.pooling(out1, (1, 1)).view(-1, l_fea1.shape[0])

From abrove, we can know that I add one line of code“l_fea1_like = torch.ones_like(l_fea1)”，In order to modify the code as little as possible, I used an identity matrix with the same size as l_fea1 to replace the original l_fea1, so that it seems that l_fea1 is not used.

But when I do the ablation study, I find that the loss funtion is not work, the loss function does not decrease, which seems to indicate that the model has not learned any information.

So I want to know why, and I also want to know the author how to do the ablation?

I want to know whether the label input is important to the model, so I delete the label input in my ablation study. This is my way:
    _x1 = _x1 / _x1.norm(dim=-1, keepdim=True)
    l_fea1_like  = torch.ones_like(l_fea1)       ### this is what I add
    logits_per_image1 = logit_scale1 * _x1 @ l_fea1.t().float() 
    out1 = logits_per_image1.view(imshape[0][0], imshape[0][2], imshape[0][3], -1).permute(0, 3, 1, 2) 
    cam1 = out1.clone().detach()
    cls1 = self.pooling(out1, (1, 1)).view(-1, l_fea1.shape[0]) 
From abrove, we can know that I add one line of code“l_fea1_like = torch.ones_like(l_fea1)”，In order to modify the code as little as possible, I used an identity matrix with the same size as l_fea1 to replace the original l_fea1, so that it seems that l_fea1 is not used.

But when I do the ablation study, I find that the loss funtion is not work, the loss function does not decrease, which seems to indicate that the model has not learned any information.

So I want to know why, and I also want to know the author how to do the ablation?

Thank you for your interest in our work! When conducting the ablation study, I did not use label features and image features to calculate similarity for obtaining the 4HW feature map. Instead, I first transformed the features outputted by the encoder into 4 channels using a 1*1 convolution, and then applied Global Average Pooling (GAP) to obtain category predictions.

I added a convolution layer to the model: self.conv_head = nn.Conv2d(in_channels=final_dimension, out_channels=self.num_classes, kernel_size=1, bias=True) In the forward function, it is implemented as: x = self.conv_head(x) cam = x.detach().clone() out = self.pooling(x, (1, 1)).view(-1, self.num_classes)

Additionally, during our actual training, we set the loss weight for stage 1 to 0.0, so it is reasonable that modifying the code for stage 1 did not have any effect.

I want to know whether the label input is important to the model, so I delete the label input in my ablation study. This is my way:
    _x1 = _x1 / _x1.norm(dim=-1, keepdim=True)
    l_fea1_like  = torch.ones_like(l_fea1)       ### this is what I add
    logits_per_image1 = logit_scale1 * _x1 @ l_fea1.t().float() 
    out1 = logits_per_image1.view(imshape[0][0], imshape[0][2], imshape[0][3], -1).permute(0, 3, 1, 2) 
    cam1 = out1.clone().detach()
    cls1 = self.pooling(out1, (1, 1)).view(-1, l_fea1.shape[0]) 
From abrove, we can know that I add one line of code“l_fea1_like = torch.ones_like(l_fea1)”，In order to modify the code as little as possible, I used an identity matrix with the same size as l_fea1 to replace the original l_fea1, so that it seems that l_fea1 is not used. But when I do the ablation study, I find that the loss funtion is not work, the loss function does not decrease, which seems to indicate that the model has not learned any information. So I want to know why, and I also want to know the author how to do the ablation?
Thank you for your interest in our work! When conducting the ablation study, I did not use label features and image features to calculate similarity for obtaining the 4HW feature map. Instead, I first transformed the features outputted by the encoder into 4 channels using a 1*1 convolution, and then applied Global Average Pooling (GAP) to obtain category predictions.

I added a convolution layer to the model: self.conv_head = nn.Conv2d(in_channels=final_dimension, out_channels=self.num_classes, kernel_size=1, bias=True) In the forward function, it is implemented as: x = self.conv_head(x) cam = x.detach().clone() out = self.pooling(x, (1, 1)).view(-1, self.num_classes)

Additionally, during our actual training, we set the loss weight for stage 1 to 0.0, so it is reasonable that modifying the code for stage 1 did not have any effect.

Thank you very much! I have solved it! Thank you.

zhangst431 / TPRO

I have a question when I do the ablation study #1