The spending of time to train a language model based on transformer with fairseq

intel / handwritten-chinese-ocr-samples

End-to-end model training and deployment reference for handwritten Chinese text recognition, and can also be extended to other languages.

Other

145 stars 31 forks source link

The spending of time to train a language model based on transformer with fairseq #8

Open Randy-1009 opened 2 years ago

Randy-1009 commented 2 years ago

I am using 3 Tesla V100 GPUs to train a model based on transformer with fairseq. The parameters are set as the same as the given train sentence. However, each epoch takes a lot of time (more than 2 hours) and is this normal ? I'd like to know how long it takes to train the model when you did this reserch. Thank you~

Randy-1009 commented 2 years ago

I can stop training until the Perplexity(PPL) is 29.xx, right? Now after 20 epochs, the Perplexity is 30.16.