kpe / bert-for-tf2

A Keras TensorFlow 2.0 implementation of BERT, ALBERT and adapter-BERT.
https://github.com/kpe/bert-for-tf2
MIT License
803 stars 193 forks source link

use custom vocab.txt #66

Closed 2696120622 closed 4 years ago

2696120622 commented 4 years ago

My corpous consists of pure numbers like 1, 2, ..., 1000000, ..., 1002342, .... It is differen from words in any language. Can I replace the vocab.txt with my own vocab.tx created using my corpous for fine-tuning bert? Or, should I train bert on my corpous from scratch?

Thanks.

kpe commented 4 years ago

pre-training from scratch would always work. To use a pre-trained model you have to use the exact same tokenizer that the pre-trained model uses.