Closed ybracke closed 2 months ago
A first experiment with a randomly initialized decoder is eight-snub
(hidden commit: 65e898b).
Check out this experiment (on the inspect branch) to use its associated model with: dvc exp apply eight-snub
.
This model was trained with a randomly initialized version of dbmdz/bert-base-historic-multilingual-cased
as the decoder (and accordingly, also uses the associated tokenizer for the decoder). At first glance, the predictions of this model do not look substantially worse or better than those of a model that was initialized with a BERT-decoder pre-trained on modern German. This should be investigated further.
How well does the model work if we replace the pre-trained decoder with a randomly initialized one (BERT2Rnd)? See this blogpost, Rothe et al. (2020)