Since my objective is quite different from the original training script I implemented the training from scratch but I noticed that it takes much more time than a simple LSTM model to become somewhat decent and the results are not fully concise language even after 15 epochs on 2 million sentences. I am getting outputs that look like this
Gold label:
In most cases , accurate results can only be achieved after a laborious and expensive trial and error process .
Output:
only most accurate cases can be achieved after a laborious error and process results In trial and expensive suit.
Currently I am using a small model with 4 layers and 2 heads each.
I randomly initialized the position encodings and multiplied them by 0.1 to match the variance of my word embeddings.
I am trying to use this repository to train a language model with an additional input. My data looks like this:
The labels look like this
Since my objective is quite different from the original training script I implemented the training from scratch but I noticed that it takes much more time than a simple LSTM model to become somewhat decent and the results are not fully concise language even after 15 epochs on 2 million sentences. I am getting outputs that look like this
Gold label: In most cases , accurate results can only be achieved after a laborious and expensive trial and error process .
Output: only most accurate cases can be achieved after a laborious error and process results In trial and expensive suit.
Currently I am using a small model with 4 layers and 2 heads each.
I randomly initialized the position encodings and multiplied them by 0.1 to match the variance of my word embeddings.
Any ideas what I could have missed?
Here is some of my code