Open herbertchen1 opened 6 years ago
I do not think that you can change the word embedding easily since its dimension must be the same as the output of each layer, in the case of the pre-trained model 768
(cfg.n_embd
).
Training a new language model from scratch is indeed quite expensive and tedious.
And training the LM is very hard...?