Training Language Model

lucasnfe commented 3 years ago

Hi,

Thanks for such a clean implementation and documentation!

I've been trying to train a language model (to generate text) with a linear attention transformer, but I'm having problems to batchfy my dataset. I've followed this colab to build the model. I can build the model and pass a placeholder input to it successfully. However, I'm not sure how to pack textual data into batches in order to train the language model on the GPU.

I've followed this tutorial to load and batchy the data. Could you please provide an example of how to train a linear attention transformer? Perhaps the code to train that handwritten digit generator from the colab?

Thanks again!

lucasnfe commented 3 years ago

I just found the github repo of the image generator: https://github.com/idiap/linear-transformer-experiments/tree/master/image-generation

Bachstelze commented 1 year ago

How did you train the language model in the end?

idiap / fast-transformers

Training Language Model #107