Open ybracke opened 1 year ago
A first experiment with a randomly initialized encoder and decoder is pudgy-jear
(hidden commit: e7a7ab7).
Check out this experiment with dvc exp apply pudgy-jear
to inspect its associated model.
This model was trained with a randomly initialized version of dbmdz/bert-base-historic-multilingual-cased
as the encoder and as the decoder with 100_000 training examples from dtak-1600-1699 (CAB normalized) for 3 epochs. At first glance, the predictions of this model look worse than those of a model that was initialized with a pre-trained historic encoder. However, the loss is still decreasing in epoch 3, so with enough training this model might still perform equally well as one without pre-training. This should be investigated further.
How well does the model work if we replace the pre-trained encoder (and decoder) with a randomly initialized one (Rnd2Rnd)?