Open yiranxijie opened 5 years ago
That seems beyond the scope of this repository, but https://towardsdatascience.com/pre-training-bert-from-scratch-with-cloud-tpu-6e2f71028379 is a quite good description on doing full training of BERT models from your own data.
Another approach is called "further pre-training" which builds on the horizontal pre-training. Has anyone tried this in this community? Here's a paper that shows favourable "domain-specific pre-training" outcomes
How to train our own domain-specific data instead of using pre-training models?