Closed usuyama closed 3 years ago
Hi @usuyama , you can find the details here: https://github.com/lanwuwei/BERTOverflow
Thank you.
For others reference, here's the command that I found in https://github.com/lanwuwei/BERTOverflow
python3 run_pretraining.py \
--input_file=gs://softbert_data/processed_data/*.tfrecord \
--output_dir=gs://softbert_data/model_base/ \
--do_train=True \
--do_eval=True \
--bert_config_file=gs://softbert_data/model_base/bert_config.json \
--train_batch_size=512 \
--max_seq_length=128 \
--max_predictions_per_seq=20 \
--num_train_steps=1500000 \
--num_warmup_steps=10000 \
--learning_rate=1e-4 \
--use_tpu=True \
--tpu_name=$TPU_NAME --save_checkpoints_steps 100000
Appreciate if you can help me with the other two questions when you have time, @jeniyat
Q: How did you decide 64,000, the size of WordPiece vocabulary? A: We experimented with different vocab sizes, and 64k gave the best results
Q: Have you tried continual-pretraining from bert-base using the unlabeled data (152 million sentences from the StackOverflow)? A: No. That would be an interesting experiment to do.
Thanks for sharing your great work.
Some quick questions about the BERT pretraining:
Thank you, Naoto