Thanks for such a clean implementation and documentation!
I've been trying to train a language model (to generate text) with a linear attention transformer, but I'm having problems to batchfy my dataset. I've followed this colab to build the model. I can build the model and pass a placeholder input to it successfully. However, I'm not sure how to pack textual data into batches in order to train the language model on the GPU.
I've followed this tutorial to load and batchy the data. Could you please provide an example of how to train a linear attention transformer? Perhaps the code to train that handwritten digit generator from the colab?
Hi,
Thanks for such a clean implementation and documentation!
I've been trying to train a language model (to generate text) with a linear attention transformer, but I'm having problems to batchfy my dataset. I've followed this colab to build the model. I can build the model and pass a placeholder input to it successfully. However, I'm not sure how to pack textual data into batches in order to train the language model on the GPU.
I've followed this tutorial to load and batchy the data. Could you please provide an example of how to train a linear attention transformer? Perhaps the code to train that handwritten digit generator from the colab?
Thanks again!