Open Hasanmog opened 6 months ago
I ran into the same issue and made a few modifications to the way I load my fine-tuned BERT model. As far as I can tell, the newly initialized weights are only for the pooler-layer. In my case, I fine-tuned a BERT model using MLM, which doesn't train the pooling layer as it's not required for the task. In turn, when I save that model it doesn't include those parameters and when I re-load it it produces the error you mentioned.
From what I understand, the GD model also does not use the pooled output and uses the BertModelWarper()
simply to access the outputs from each state more easily. In the GroundingDINO
module (from models/groundingdino.py
) in lines 267-269 the code uses the last hidden state and trains another linear layer on top of it, ignoring the pooling layer.
I think when loading a fine-tuned BERT model, only the pooler-layer weights are newly initialized, and not the fine-tuned parameters. So in theory, I think for using GD the warning doesn't really matter. For my own sanity to ensure at least nothing is initialized randomly, I implemented this function to load my fine-tuned BERT model and re-initialize the pooler weights with the ones from a pre-trained BERT model.
def mlm2bm(text_encoder_type, model_path):
''' Load fine-tuned MLM model as BertModel and replace the pooling layer
weights with the original ones. MLM models don't require a pooling layer. '''
original_model = BertModel.from_pretrained(text_encoder_type)
print('Loaded original model')
# Save the original pooling layer weights
original_pooler_weight = original_model.pooler.dense.weight.clone()
original_pooler_bias = original_model.pooler.dense.bias.clone()
# Load fine-tuned MLM model as BertModel
fine_tuned_model = BertModel.from_pretrained(model_path)
# Replace the pooling layer weights with the original ones
fine_tuned_model.pooler.dense.weight = torch.nn.Parameter(original_pooler_weight)
fine_tuned_model.pooler.dense.bias = torch.nn.Parameter(original_pooler_bias)
print('Replaced pooler weights!')
return fine_tuned_model
Hope this helps!
Hello , I introduced some parameters to the text encoder (BERT) and trained for some epochs. Everything ran smoothly. But when evaluating using the resultant checkpoint , I'm getting this warning about BERT new params:
"Some weights of Bert Model were not initialized from model checkpoint at bert-uncased and are newly initialized"
I think it will evaluate using the vanilla bert (without these newly added params). I guess that the trained params are loaded after loading the bert model which causes this warning.
So , what should I change in the config file in order to evaluate using BERT with these introduced trained params?
Thanks in advance !
cc : @aghand0ur