Open CarloNicolini opened 6 months ago
Hi, the issue here is that I have not yet found a satisfactory as to how to implement Embedding layers in NNGeometry.
A possible temporary workaround would be to emulate embeddings using a Linear layer + onehot encoding
Another possible workaround would be to compute the FIM for all parameters but the Embedding layer's one by manually creating the LayerCollection
object passed to the FIM constructor, instead of the default which is to add all Pytorch modules with parameters.
Would you kindly expand on the first workaround? I've tried to follow your suggestion and came up with this snippet, but I am not sure this is what you intended, could you please check?
import torch.nn as nn
import torch.nn.functional as F
class OneHotLinearEmbedding(nn.Module):
def __init__(self, pretrained_embedding_layer):
super(OneHotLinearEmbedding, self).__init__()
# Get the input size and embedding dimension from the pretrained embedding layer
input_size, embedding_dim = pretrained_embedding_layer.weight.size()
# Linear layer to perform embedding
self.embedding_layer = nn.Linear(embedding_dim, input_size , bias=False)
# Use the pretrained weights for the embedding layer
self.embedding_layer.weight.data.copy_(pretrained_embedding_layer.weight.data)
def forward(self, input_indices):
# One-hot encoding
one_hot = F.one_hot(
input_indices, num_classes=self.embedding_layer.in_features
).float()
# Apply linear layer for embedding
embedded = self.embedding_layer(one_hot)
return embedded
Apparently though after I replace my input Embedding layer with this for my model, the FIM computation breaks when dealing with LayerNorm. Another layer to implement on NNGeometry?
Hi just a quick update that I should be able to find some time to fix this later this week or next week.
You are indeed right that LayerNorm is not implemented yet. After a quick glance it looks to be implemented very similarly to BatchNorm which means that it should not be too difficult to implement in NNGeometry.
Your implementation of the LinearLayer workaround of the embedding layer looks correct to me.
I will keep you updated as soon as I make progress!
Hello, for some reason, implementing LayerNorm broke the test suite for other types of layers (Cosine and WeightNorm), thus I cannot merge it to master. It additionnally requires some more cleaning.
In the meantime you can use this branch, it should do the job for your usage. Otherwise, can you provide me with the simplest architecture for which it fails ?
Best
First of all great library, I've always been looking for some ways to get jacobians and fisher information matrices for my PyTorch models. While the library is fine with my vision models based on simple convolutional networks, I find it harder to use with Huggingface pretrained models. To be clear, I believe the embedding layers are the culprit here.
I devised a dataloader taking text and returning a dictionary with "input_ids" and "attention_mask" which takes in a list of strings as input and yields a batch like a dictionary with the above keys and torch.Tensor of integer type as their values.
then I instanciate the dataloader
For a model with a total of 70m parameters, having the entire Fisher matrix in memory is prohibitive, so I have chosen to use the diagonal with storage proportional to number of parameters, by choosing the
PMatDiag
representation you kindly provided in your library.I thought this would give me the diagonal of the Fisher information matrix, right? However, an error appears that seems related with LayerCollection creation.
but I get the following error:
It looks like the reason why I get this error has to do with the Embedding layers (there are two embedding layers, one to convert token ids from the vocabulary space (size 50304) to the latent space (size 512) and another embedding layer at the end to do viceversa.
What should I do to have the FIM diagonal of all model parameters? Many thanks, and again, great package.