How to train our own domain-specific data instead of using pre-training models? - Githubissues

jina-ai / clip-as-service

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

https://clip-as-service.jina.ai

Other

12.39k stars 2.06k forks source link

How to train our own domain-specific data instead of using pre-training models? #331

Open yiranxijie opened 5 years ago

yiranxijie commented 5 years ago

How to train our own domain-specific data instead of using pre-training models？

PeterisP commented 5 years ago

That seems beyond the scope of this repository, but https://towardsdatascience.com/pre-training-bert-from-scratch-with-cloud-tpu-6e2f71028379 is a quite good description on doing full training of BERT models from your own data.

boxabirds commented 5 years ago

Another approach is called "further pre-training" which builds on the horizontal pre-training. Has anyone tried this in this community? Here's a paper that shows favourable "domain-specific pre-training" outcomes