question on time estimates for 'reasonable' p/retraining of 'reasonable' dataset size

microsoft / AzureML-BERT

End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service

https://azure.microsoft.com/en-us/blog/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale/

MIT License

394 stars 127 forks source link

question on time estimates for 'reasonable' p/retraining of 'reasonable' dataset size #59

Closed eip2016num1 closed 4 years ago

eip2016num1 commented 4 years ago

Hi, This is awesome - just what we were looking for.

We need quick estimates for time 'T' it will take to p/retrain (take BERT and continue training with a custom new corpus - I assume that will be better than training from scratch) BERT-en-lg with dataset of size 'S' on GPU of type 'G'.

Let's say we use the GPUs and wikipedia dataset suggested in this notebook to pretrain a new model until it performs as well as original BERT-en-lg on GLUE (or pick any other task).

How long will that take?

thanks much!

eip2016num1 commented 4 years ago

never mind - found this https://azure.microsoft.com/en-us/blog/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale/

skaarthik commented 4 years ago

@eip2016num1 FYI, there is a more recent implementation for BERT pretraining on AzureML at https://github.com/microsoft/onnxruntime-training-examples/tree/master/nvidia-bert that is significantly faster than the implementation in this repo. More details on the speed up at https://techcommunity.microsoft.com/t5/azure-ai/onnx-runtime-training-technical-deep-dive/ba-p/1398310.