Closed yiranxijie closed 5 years ago
I also have this question whenever someone gets to it, but I think that this isn't doable with this package. There's got to be a way to hack it, but you'd probably have to take away some of the code at the beginning of the pipeline. @yiranxijie
Is there any news on this? Training one of these models from scratch?
@mattivi not yet
Hi all, so training from scratch will probably never be a goal for the present repo but here are great transformer codebases that were scaled to >64 GPUs:
Note that the typical compute required to train BERT is about 64 GPU for 4 days (which currently means around $10k-15k if you are renting cloud compute). TPU training is not possible in PyTorch currently, you should use a TensorFlow repo to do TPU training (like the original BERT or tensor2tensor for instance).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
How to train our own domain-specific data instead of using pre-training models?