neuralmind-ai / portuguese-bert

Portuguese pre-trained BERT models
Other
792 stars 122 forks source link

Is possible to test BERT-LSTM or BERT-LSTM-CRF with BERT-CRF pre-trained model for NER? #22

Closed alcidesmig closed 3 years ago

alcidesmig commented 3 years ago

Hello!

Talking about NER, is possible to use BERTimbau Base - BERT-CRF (total scenario, 10 classes) available in ner_evaluation/ to Fine-tuning experiments or is this expected to be done with {mBERT, clean BERTimbau, ...}?

Trying to do what i said in the title, I executed:

python run_bert_harem.py \
    --bert_model bertimbau-base_bert-crf_total \
    --labels_file labels \
    --do_train \
    --train_file data/FirstHAREM-total-train.json \
    --valid_file data/FirstHAREM-total-dev.json \
    --freeze_bert \
    --pooler sum \
    --num_train_epochs 50 \
    --per_gpu_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --do_eval \
    --eval_file data/MiniHAREM-total.json \
    --output_dir output_bert-lstm-crf_total

The result was:

10/20/2020 00:10:37 - INFO - p
ytorch_transformers.modeling_utils -   Weights of BertLSTMCRF not initialized from pre
trained model: ['loss_fct.weight', 'lstm.weight_ih_l0', 'lstm.weight_hh_l0', 'lstm.bias_ih_l0', 'lstm.bias_hh_l0', '
lstm.weight_ih_l0_reverse', 'lstm.weight_hh_l0_reverse', 'lstm.bias_ih_l0_reverse', 'lstm.bias_hh_l0_reverse']
Traceback (most recent call last):
  File "run_bert_harem.py", line 132, in <module>
    get_eval_metrics_fn=get_eval_metrics_fn,
  File "{directory}/ner_evaluation/trainer.py", line 677, in main
    model = load_model(args, args.bert_model, training=args.do_train)
  File "{directory}/ner_evaluation/utils.py", line 46, in load_model
    **model_kwargs)
  File "{directory}/anaconda3/envs/bert_crf/lib/python3.6/site-packages/pytorch_transformers/modeling_utils.py", line 532,
in from_pretrained
    model.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for BertLSTMCRF:
        size mismatch for classifier.weight: copying a param with shape torch.Size([21, 768]) from checkpoint, the s
hape in current model is torch.Size([21, 200]).

Thank you for your attention.

fabiocapsouza commented 3 years ago

Hi @alcidesmig ,

The command you tried to run will not work because the BERT-CRF checkpoint has weights for 1) BERT, 2) Linear classifier layer of shape (21, 768) and 3) CRF layer, while the BERT-LSTM-CRF model would have weights for 1) BERT, 2) LSTM, 3) Linear layer named classifier of shape (21, 200) and 3) CRF layer. So there is a shape mismatch for the classifier layer, and the LSTM layer would have random weights.

If I understand your question correctly, you want to load a BERT-CRF checkpoint (trained on FirstHAREM) and continue training on another dataset. Is it right? I think you can do that but you will have to modify the checkpoint to delete the weights of the layers that have different shapes using the same name. I believe this code would solve it (I haven't tested it):

import torch

state_dict = torch.load('path/to/pytorch_weights.bin')
del state_dict['classifier']  # if using LSTM or number of classes are different
del state_dict['crf']  # if number of classes are different

torch.save(state_dict, 'path/to/pytorch_weights.bin')  # overwrite weights

This way the model will load the available weights in the checkpoint and leave the new layers with random weights.

alcidesmig commented 3 years ago

Hi, @fabiocapsouza!

I got it by doing:

del state_dict['crf.transitions']
del state_dict['crf.end_transitions']
del state_dict['crf.start_transitions']
del state_dict['crf.classifier.bias']
del state_dict['classifier.bias']
del state_dict['classifier.weight']

Thank you!

CarlosEduardoSaMotta commented 3 years ago

Hi,

I tried to delete those state_dict keys, as suggested by @alcidesmig, but the error persists: "RuntimeError: Error(s) in loading state_dict for BertLSTMCRF: size mismatch for classifier.weight: copying a param with shape torch.Size([21, 768]) from checkpoint, the shape in current model is torch.Size([21, 200])."

But: ... bert.encoder.layer.11.output.LayerNorm.bias: torch.Size([768]) bert.pooler.dense.weight: torch.Size([768, 768]) bert.pooler.dense.bias: torch.Size([768]) classifier.weight: torch.Size([21, 768]) classifier.bias: torch.Size([21]) crf.start_transitions: torch.Size([21]) crf.end_transitions: torch.Size([21]) crf.transitions: torch.Size([21, 21])

Can not see this [21,200] layer.

Any clues?

Thanks.

alcidesmig commented 3 years ago

Hi, @CarlosEduardoSaMotta. Can you post the code you used to delete the keys in state_dict?

CarlosEduardoSaMotta commented 3 years ago

Hi, @alcidesmig.

import torch   
state_dict = torch.load('/content/portuguese-bert/ner_evaluation/bertimbau-base_bert-crf_total/pytorch_model.bin')   #'path/to/pytorch_weights.bin')  
print('\n'.join([k + ': ' + str(state_dict[k].shape) for k in state_dict.keys()]))

And I've got:

... bert.encoder.layer.11.output.LayerNorm.bias: torch.Size([768]) bert.pooler.dense.weight: torch.Size([768, 768]) bert.pooler.dense.bias: torch.Size([768]) classifier.weight: torch.Size([21, 768]) classifier.bias: torch.Size([21]) crf.start_transitions: torch.Size([21]) crf.end_transitions: torch.Size([21]) crf.transitions: torch.Size([21, 21])

Then:

del state_dict['crf.transitions']
del state_dict['crf.end_transitions']
del state_dict['crf.start_transitions']
#del state_dict['crf.classifier.bias']
del state_dict['classifier.bias']
del state_dict['classifier.weight']

!python run_bert_harem.py \
    --bert_model /content/portuguese-bert/ner_evaluation/bertimbau-base_bert-crf_total \
    --labels_file data/classes-total.txt \
    --do_train \
    --train_file data/FirstHAREM-total-train.json \
    --valid_file data/FirstHAREM-total-dev.json \
    --freeze_bert \
    --pooler sum \
    --num_train_epochs 50 \
    --per_gpu_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --do_eval \
    --eval_file data/MiniHAREM-total.json \
    --output_dir output_bert-lstm-crf_total

And the error message:

RuntimeError: Error(s) in loading state_dict for BertLSTMCRF: size mismatch for classifier.weight: copying a param with shape torch.Size([21, 768]) from checkpoint, the shape in current model is torch.Size([21, 200]).

alcidesmig commented 3 years ago

Hi again, @CarlosEduardoSaMotta.

What have you used to save the model after deleting classifier.weight?

CarlosEduardoSaMotta commented 3 years ago

Oops! I am loading from disk ... Thanks.