Closed MiraclesinWang closed 2 years ago
To demonstrate my question, I use pdb to check the running process in '_frompretrained' function of _transformers.modelingutils.PreTrainedModel. And this is a screenshot. As you can see, some of your model's parameters are among the 'missing keys', which means they are not initialised properly.
We add the cross attention layers as additional parameters. They are randomly initialised and pre-trained by ALBEF.
Thanks for your answer.
Hello, ALBEF is really an amazing VLP model. Thanks for your contribution. Nevertheless, I have encountered a problem when using it. Would you do me a favor?
I know you rewrite _transformers.models.bert.modelingbert and put it in xbert.py. After checking your file, I found some parameters were renamed when rewriting the last 6 layers of Bert. For example, your model has a parameter named 'bert.encoder.layer.10.crossattention.self.value.weight', while the counterpart in the model defined in _transformers.models.bert.modelingbert is 'bert.encoder.layer.10.attention.self.value.weight'. However, you initialise Bert with the function '_frompretrained' defined in _transformers.modelingutils.PreTrainedModel.
To the best of my knowledge, the '_frompretrained' function just mentioned will download parameters from web hub if no path is given, which can happen when you set the first arg to 'bert-base-uncased'. However, the parameters downloaded are entitled in the same way as that in _transformers.models.bert.modelingbert, whose parameters' names are slightly different from yours. Consequently, some parameters in your model, like the 'bert.encoder.layer.10.crossattention.self.value.weight' I've just mentioned, can't be initialised properly.
I don't find any code which deals with this problem. Is this a bug? Or is there some way you deal with this problem which I neglect? Or is it an unimportant itch since you will pretrain it afterwards?