pytorch / captum

Model interpretability and understanding for PyTorch
https://captum.ai
BSD 3-Clause "New" or "Revised" License
4.73k stars 476 forks source link

Multiple differently shaped inputs are crashing Integrated Gradients (embedding layers + other layers) #928

Open code-ksu opened 2 years ago

code-ksu commented 2 years ago

❓ Questions and Help

Hello. I have developed a model with three inputs types. Image, categorical data and numerical data. For Image data I've used ResNet50 for the other two I develop my own network.

class MulticlassClassification(nn.Module):
    def __init__(self, cat_size, num_col, output_size, layers, p=0.4):
        super(MulticlassClassification, self).__init__()

        # IMAGE: ResNet
        self.cnn = models.resnet50(pretrained = True)
        for param in self.cnn.parameters():
            param.requires_grad = False
        n_inputs = self.cnn.fc.in_features
        self.cnn.fc = nn.Sequential(
          nn.Linear(n_inputs, 250), 
          nn.ReLU(), 
          nn.Dropout(p),
          nn.Linear(250, output_size),                   
          nn.LogSoftmax(dim=1)
        )

        # TABULAR 
        self.all_embeddings = nn.ModuleList(
            [nn.Embedding(categories, size) for categories, size in cat_size]
        )
        self.embedding_dropout = nn.Dropout(p)
        self.batch_norm_num = nn.BatchNorm1d(num_col)

        all_layers = []
        num_cat_col = sum(e.embedding_dim for e in self.all_embeddings)
        input_size = num_cat_col + num_col

        for i in layers:
            all_layers.append(nn.Linear(input_size, i))
            all_layers.append(nn.ReLU(inplace=True))
            all_layers.append(nn.BatchNorm1d(i))
            all_layers.append(nn.Dropout(p))
            input_size = i

        all_layers.append(nn.Linear(layers[-1], output_size))

        self.layers = nn.Sequential(*all_layers)

        #combine
        self.combine_fc = nn.Linear(output_size * 2, output_size)

    def forward(self, image, x_categorical, x_numerical):
        embeddings = []
        for i, embedding in enumerate(self.all_embeddings):
            print(x_categorical[:,i])
            embeddings.append(embedding(x_categorical[:,i]))
        x = torch.cat(embeddings, 1)
        x = self.embedding_dropout(x)

        x_numerical = self.batch_norm_num(x_numerical)
        x = torch.cat([x, x_numerical], 1)
        x = self.layers(x)

        # img
        x2 = self.cnn(image)

        # combine
        x3 = torch.cat([x, x2], 1)
        x3 = F.relu(self.combine_fc(x3))

        return x

Now after successful training I would like to calculate integrated gradients

testiter = iter(testloader)
img, stack_cat, stack_num, target = next(testiter)
attributions_ig = ig.attribute(inputs=(img.cuda(), stack_cat.cuda(), stack_num.cuda()), target=target.cuda())

And here I got an Error: RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding) I figured out that captum injects a wrong shaped tensor into my x_categorical input (with the print in my forward method). It seems like captum only sees the first input tensor and uses it's shape for all other inputs. Is this correct?

I've found this the Issue #439 and tried all suggested solutions without success. When I used an Interpretable Embedding for categorical data I got this error: IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

I would be very grateful for any tips and advises how to combine all three inputs and to solve my problem.

NarineK commented 2 years ago

@code-ksu, Integrated Gradients assumes that the first dimension in all inputs passed through inputs arguments is the same and it corresponds to the number of examples (batch size). This is because integrated gradients must scale the inputs based on n_steps argument for the batch dimension for all inputs. Usually we get this type of errors if in the forward method we do not account for the first dimension to be batch size. If you can share a collab notebook I can debug it and tell where exactly the issue is. This is a high level explanations.

code-ksu commented 2 years ago

Dear @NarineK thank you very much for your answer! I've prepared the google Colab and here is the link: https://colab.research.google.com/drive/1sH0PSSmJUulOLmz4jtDrHrrldu3rgyNK?usp=sharing And I've prepared also a small sample of my data set. The link to it is here: https://drive.google.com/drive/folders/1sQwWRB-VORAKdPCTa-fqgccy7hg0Rd_Q?usp=sharing Thank you in advance for your help!

NarineK commented 2 years ago

@code-ksu, thank you very much for sending the link and sorry for the late reply. I cannot access the colab notebook. Is there a publicly shareable link of the notebook ?

code-ksu commented 2 years ago

Dear @NarineK the link is publicly shareable. Could you please try one more time. grafik https://colab.research.google.com/drive/1sH0PSSmJUulOLmz4jtDrHrrldu3rgyNK?usp=sharing Thank you in advance for your help.

NarineK commented 2 years ago

@code-ksu, thank you for sharing. I see that the colab notebook accesses the entire dataset and performs training. Is it possible that you provide me only the part where you perform attributions ? Basically loading the model and performing the predict on one example or a small batch will be sufficient. I see that currently it wants to access google drive and run the training but we can skip that part for the attributions.

code-ksu commented 2 years ago

Dear @NarineK, I don't perform the training but as far as I understood I need data to perform the attributions as well. I don't know any other way to supply you the needed data. That's why attached the 2nd link to my public google drive ( https://drive.google.com/drive/folders/1sQwWRB-VORAKdPCTa-fqgccy7hg0Rd_Q?usp=sharing). This already is a small sample of my data (only 100 entries per set) as well as my model's state. You could add this files to your own google drive and if needed adjust the base_data_dir in this line: https://colab.research.google.com/drive/1sH0PSSmJUulOLmz4jtDrHrrldu3rgyNK#scrollTo=TywcEVkvjsZI&line=1&uniqifier=1

Best regards

NarineK commented 2 years ago

@code-ksu, I mentioned that it is training because I can clearly see these lines in a colab cell:

#CrossEntropyLoss
model, history = train(
    model,
    criterion,
    optimizer,
    dataloaders['train'],
    dataloaders['val'],
    save_file_name=save_file_name,
    max_epochs_stop=3,
    n_epochs=50,
    print_every=1)

In order for me to debug your use case I don't need any trained weights. It can as well be an untrained model with random weights.

code-ksu commented 2 years ago

Dear @NarineK I've removed unnecessary code lines. I hope this time everything works well. But I still think you may need to copy the files to your drive. I don't know if my drive authorization works for you. Best regards.

NarineK commented 2 years ago

@code-ksu, I was able to reproduce the error. I might be looking into the wrong cell but the model fails for me even if I try to do a simple predict. For example if I run this line: print(model(img.to(device), stack_cat.to(device), stack_num.to(device))) in the cell where you load the data it fails. Does it happen to you too ?

testiter = iter(testloader)
# Get a batch of testing images and labels
img, stack_cat, stack_num, target = next(testiter)

print(model(img.to(device), stack_cat.to(device), stack_num.to(device)))

I see the same error that you pointed me out to:

IndexError                                Traceback (most recent call last)
[<ipython-input-61-edf5aa2f7a1a>](https://localhost:8080/#) in <module>()
      7 #attributions_ig = ig.attribute(inputs=(img.to(device), stack_cat.to(device), stack_num.to(device)), target=target.to(device))
      8 
----> 9 print(model(img.to(device), stack_cat.to(device), stack_num.to(device)))
     10 
     11 attrs = lig.attribute(inputs=(img.to(device), stack_cat.to(device), stack_num.to(device)), target=target.to(device))

1 frames
[<ipython-input-19-6e17c191d90b>](https://localhost:8080/#) in forward(self, image, x_categorical, x_numerical)
     47             print(x_categorical[:,i])
     48             embeddings.append(embedding(x_categorical[:,i]))
---> 49         x = torch.cat(embeddings, 1)
     50         x = self.embedding_dropout(x)
     51 

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
code-ksu commented 2 years ago

Dear @NarineK I can make predictions and the line that errored for you works for me. But I got another error. In combination with LayerIntegratedGradients. I have the problem with the device "cpu". Which is strange because to(device) in previous code lines worked. I assume there is some issues either with Captum or with Captum in combination with Google Colab. I've tried to google the error without success. Best regards.

KeyError                                  Traceback (most recent call last)

[<ipython-input-42-edf5aa2f7a1a>](https://localhost:8080/#) in <module>()
      9 print(model(img.to(device), stack_cat.to(device), stack_num.to(device)))
     10 
---> 11 attrs = lig.attribute(inputs=(img.to(device), stack_cat.to(device), stack_num.to(device)), target=target.to(device))

6 frames

[/usr/local/lib/python3.7/dist-packages/captum/_utils/gradient.py](https://localhost:8080/#) in <listcomp>(.0)
    328     if key_list is None:
    329         key_list = _sort_key_list(list(saved_layer.keys()), device_ids)
--> 330     return _reduce_list([saved_layer[device_id] for device_id in key_list])
    331 
    332 

KeyError: device(type='cpu')
NarineK commented 2 years ago

@code-ksu, I think that we are seeing that issue with the device because all_embeddings layer might not be returning any output. We could attribute to individual embedding layers. e.g. model_wrapped.module.all_embeddings[0]. I tried this and when I ran colab notebook it was crashing for large number of integral approximation steps. I, then reduced the number of steps to 2 and it works. If you run it with larger memory I'd recommend you to increase that number. Currently it is able to calculate the attributions for the sub-embeddings.

NarineK commented 2 years ago

@code-ksu, if this resolved your problem can we close this issue ?

code-ksu commented 2 years ago

Dear @NarineK sorry for a long delay. Thank you very much for your help. I would like to ask you one more question. I have tried to calculate the "Average Feature Importances" based on the Captum tutorial. The length of attrs is only 4 objects which have the same length of 48. This shape does not align with the shape of my inputs. As a result of it I have the error "IndexError: list index out of range". I am not understanding how to understand what features do have influence on my model. I expected the output to have the same length as the number of features I use in my model. I would highly appreciate any advice from you. And one more time thank you for your help. Best regards.

NarineK commented 2 years ago

@code-ksu, if you use LayerIntegratedGradients the way that I provided to you then the attributions will have the same shape as the outputs of the embedding layers that we provided in the constrictor of the LayerIntegratedGradients. Usually we know the embeddings that correspond to tokens and we can sum or average the attributions across embedding dimensions for given token. If you cannot identify it on the embedding level maybe you can transform them the way that a layer returns token level embeddings and you can identify the attribution w.r.t. that token.

You can also have an artificial Identity layer that can return the shape that you want to and you can attribute to that layer.