mandarjoshi90 / coref

BERT for Coreference Resolution
Apache License 2.0
444 stars 92 forks source link

Failed to fineune BERTResolver from the released checkpoint #41

Closed HaixiaChai closed 4 years ago

HaixiaChai commented 4 years ago

Hi, I am going to restore your checkpoint and finetune the model on a new dataset, but I failed to do this and received the following errors:

Traceback (most recent call last): File “train.py”, line 23, in model = util.get_model(config) File “/home/chaiha/bert-e2e-coref/util.py”, line 21, in get_model return independent.CorefModel(config) File “/home/chaiha/bert-e2e-coref/independent.py”, line 61, in init init_from_checkpoint(config[‘init_checkpoint’], assignment_map) File “/home/chaiha/anaconda3/envs/e2e-gpu/lib/python3.7/site-packages/tensorflow/python/training/checkpoint_utils.py”, line 190, in init_from_checkpoint _init_from_checkpoint, args=(ckpt_dir_or_file, assignment_map)) File “/home/chaiha/anaconda3/envs/e2e-gpu/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py”, line 1516, in merge_call return self._merge_call(merge_fn, args, kwargs) File “/home/chaiha/anaconda3/envs/e2e-gpu/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py”, line 1524, in _merge_call return merge_fn(self._distribution_strategy, *args, **kwargs) File “/home/chaiha/anaconda3/envs/e2e-gpu/lib/python3.7/site-packages/tensorflow/python/training/checkpoint_utils.py”, line 229, in _init_from_checkpoint tensor_name_in_ckpt, str(variable_map[tensor_name_in_ckpt]) ValueError: Shape of variable coref_layer/slow_antecedent_scores/hidden_bias_0:0 ((1000,)) doesn’t match with shape of tensor coref_layer/slow_antecedent_scores/hidden_bias_0 ([3000]) from checkpoint reader.

Here is my configuration: finetune_bert_base = ${best}{ num_docs = 2802 bert_learning_rate = 1e-05 task_learning_rate = 0.0002 max_segment_len = 128 ffnn_size = 3000 train_path = ${data_dir}/train.english.128.jsonlines eval_path = ${data_dir}/test.english.128.jsonlines conll_eval_path = ${data_dir}/test.english.v4_gold_conll max_training_sentences = 11 bert_config_file = ${best.log_root}/bert_base/bert_config.json vocab_file = ${best.log_root}/bert_base/vocab.txt tf_checkpoint = ${best.log_root}/bert_base/model.max.ckpt init_checkpoint = ${best.log_root}/bert_base/model.max.ckpt } I also made a copy of 'bert_base' folder into the folder called 'finetune_bert_base' and that still fails... Thank you for any thoughts!

mandarjoshi90 commented 4 years ago

Could you try changing ffnn_size = 1000?