Hi~
I want to utilize AlbertForPreTraining to do MaskedLM task on new datasets(with new vocab.txt, whose size is not 21128) based on the pretraining weights. How can I do that precisely?
After I read the codes, I found decoder(Linear) in predictions(AlbertLMPredictionHead) in cls(AlbertPreTrainingHeads) in AlbertForPreTraining has output size of 21128, which seems hard to adapt to my new vocab size..
Hi~ I want to utilize AlbertForPreTraining to do MaskedLM task on new datasets(with new vocab.txt, whose size is not 21128) based on the pretraining weights. How can I do that precisely?
After I read the codes, I found decoder(Linear) in predictions(AlbertLMPredictionHead) in cls(AlbertPreTrainingHeads) in AlbertForPreTraining has output size of 21128, which seems hard to adapt to my new vocab size..
Thank you very much~