Info on training - Githubissues

orlink commented 2 years ago

What sort of provider and hardware did you use to train it? How long should I expect the training to take? A little bit of info just to orient myself.. Thank you very much for the great project!

oliverguhr commented 2 years ago

I trained the base models on a 2080TI, with the data of 4 languages the training takes 3h for 3 epochs. The large models were trained on a 3090, with the data of 4 languages the training took about a full day for 3 epochs.

orlink commented 2 years ago

Thank you very much for the quick reply! I just started training of the base model for a one language (BG) on an Amazon ml.g4dn.xlarge which has 14Gb of RAM and 1 T4 GPU. This is the progress so far: 22% 1364/6276 [10:22<38:40, 2.12it/s]. It seems that T4 has about half the TFLOPS, but a bit more TensorTFLOPS relative to 2080TI (https://www.microway.com/knowledge-center-articles/comparison-of-nvidia-geforce-gpus-and-nvidia-tesla-gpus/).

I need punctuation for all languages, so I'm planning on training it for all of the europarl languages first. Then maybe for other. I saw that the current huggingface base model works surprisingly well, though not perfectly, for languages on which it had not been trained (I tried Bulgarian, Turkish, Russian Gujarati), so I'm hoping that when I train it on more of the europarl languages, it will transfer even better to others. Please let me know if you have any tips in that regard. I also wander about speeding up inference, the even though the base model is not bad at 50ms on huggingface.

oliverguhr / fullstop-deep-punctuation-prediction

Info on training #8