Pretraining problem - Githubissues

Hello,

I want to do extra pretraining on the bertje model on domain specific texts and I use the pretraining code from the original BERT code. I downloaded the model from the huggingface model hub and I need to use the .ckpt files. I cannot download the model via the code as I don't have access to the internet from where I work, so I have a folder of the bert-base-dutch-cased model.

When I try to run the pretraining code I get this error:

2021-01-22 10:17:42.271665: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested
2021-01-22 10:17:42.271697: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested
2021-01-22 10:17:42.271698: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested
2021-01-22 10:17:42.271725: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested
2021-01-22 10:17:42.271734: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested
2021-01-22 10:17:42.271737: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested
2021-01-22 10:17:42.271750: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested
2021-01-22 10:17:42.271773: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested
2021-01-22 10:17:42.271787: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested
2021-01-22 10:17:42.271797: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested
INFO:tensorflow:training_loop marked as finished
I0122 10:17:42.276760 139671864993600 error_handling.py:115] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W0122 10:17:42.276864 139671864993600 error_handling.py:149] Reraising captured error
Traceback (most recent call last):
  File "/home/amber/Documents/bert/env/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)

I can get the pretraining working with the original BERT checkpoints.

The command I use is:

python run_pretraining.py --bert_config_file="bert-base-dutch-cased/config.json" --input_file="tf_examples.tfrecord" --init_checkpoint="bert-base-dutch-cased/model.ckpt" --output_dir="output_dir" --max_seq_length=16 --max_predictions_per_seq=20 --do_train=True --do_eval=True --train_batch_size=1 --eval_batch_size=1 --learning_rate=1e-4 --num_train_steps=20 --num_warmup_steps=20 --save_checkpoints_setps=20 --iterations_per_loop=20 --max_level_steps=20

Do you maybe know what is going wrong?

wietsedv / bertje

Pretraining problem #20