Pre-trained model for BERT fine-tuning with Squad v1.1

marwage commented 3 years ago

Hi Mindspore team,

We would like to fine-tune BERT with the Squad v1.1 dataset. Therefore, we run the following script model_zoo/official/nlp/bert/scripts/run_squad.sh

Unfortunately, we get the following error ValueError: For 'TensorAdd', the x_shape [32, 384, 768] and y_shape [1, 128, 768] can not broadcast.

To us, it looks like the sequence length of Squad is 384 and the pre-trained model was trained with a sequence length 128. We used the pre-trained model from https://download.mindspore.cn/model_zoo/r1.1/bertbase_ascend_v111_zhwiki_offical_nlp_bs256_loss3/bertbase_ascend_v111_zhwiki_offical_nlp_bs256_loss3.7.ckpt Additionally, the pre-trained model was trained in Chinese whereas Squad is in English.

Could you provide us with a pre-trained BERT model that works with the Squad v1.1 dataset?

Vincent34 commented 3 years ago

Hi there,

The pre-trained models of bert could match all models with sequnce length less than max_position_embeddings which is 512 by default.

While finetuning with different dataset , you need to change the option seq_length in the finetuen_eval_config.py. https://github.com/mindspore-ai/mindspore/blob/master/model_zoo/official/nlp/bert/src/finetune_eval_config.py#L48

marwage commented 3 years ago

Thank you for your response! It resolved the error.

We have a new error though. RuntimeError: mindspore/ccsrc/runtime/device/gpu/kernel_info_setter.cc:83 SupportedTypeList] Unsupported op [NPUAllocFloatStatus] We assume that we get the error because we are trying to run the fine-tuning on a GPU and the pre-training was done with Ascend. Is this correct and we need a pre-trained model that was trained on a GPU?

Additionally, there is still the issue that the model we have access to is trained in Chinese and Squad is in English.

Vincent34 commented 3 years ago

The pretrained model from mindspore could math all the device target including GPU, NPU, and CPU. There are only weights for parameters in the checkpoint with nothing stick to the device.

NPUAllocFloatStatus is an operation to check overflow status on Ascend, which shoud not be used while training with GPU. https://github.com/mindspore-ai/mindspore/blob/68f49c5ff1920ae30680c185d983ba1962cae9b0/model_zoo/official/nlp/bert/src/bert_for_finetune.py#L86

But I found that run_squad.py use BertSquadCell for training instead of BertFinetuneCell , which has not deal with GPU. Could you please help completing this just following the function in BertFinetuneCell?

We really appreciate for your contributing.

Vincent34 commented 3 years ago

The latest version 1.2.0 of MindSpore has provide some general methods start_overflow_check and get_overflow_status to check overflow, and has been used for pretraining of bert. https://github.com/mindspore-ai/mindspore/blob/68f49c5ff1920ae30680c185d983ba1962cae9b0/model_zoo/official/nlp/bert/src/bert_for_pre_training.py#L336

You may have a try using this method as well.

marwage commented 3 years ago

We solved the error by adjusting BertSquadCell according to BertFinetuneCell. If you would like to see the difference, head over to https://github.com/kungfu-ml/mindspore-bert/blob/bert/model_zoo/official/nlp/bert/src/bert_for_finetune.py

We are making progress but encountered a new error.

[ERROR] KERNEL(989,python):2021-05-31-08:57:03.687.960 [mindspore/ccsrc/backend/kernel_compiler/gpu/nn/softmax_gpu_kernel.h:222] InitSizeByAxisLastDim] Input is 3-D, but axis(1) is invalid.
Traceback (most recent call last):
  File "/home/marcel/Mindspore/bert_mindspore/scripts/../run_squad.py", line 218, in <module>
    run_squad()
  File "/home/marcel/Mindspore/bert_mindspore/scripts/../run_squad.py", line 183, in run_squad
    do_train(ds, netwithloss, load_pretrain_checkpoint_path, save_finetune_checkpoint_path, epoch_num)
  File "/home/marcel/Mindspore/bert_mindspore/scripts/../run_squad.py", line 82, in do_train
    model.train(epoch_num, dataset, callbacks=callbacks)
  File "/home/marcel/Mindspore/p3venv/lib/python3.7/site-packages/mindspore/train/model.py", line 592, in train
    sink_size=sink_size)
  File "/home/marcel/Mindspore/p3venv/lib/python3.7/site-packages/mindspore/train/model.py", line 391, in _train
    self._train_dataset_sink_process(epoch, train_dataset, list_callback, cb_params, sink_size)
  File "/home/marcel/Mindspore/p3venv/lib/python3.7/site-packages/mindspore/train/model.py", line 452, in _train_dataset_sink_process
    outputs = self._train_network(*inputs)
  File "/home/marcel/Mindspore/p3venv/lib/python3.7/site-packages/mindspore/nn/cell.py", line 331, in __call__
    out = self.compile_and_run(*inputs)
  File "/home/marcel/Mindspore/p3venv/lib/python3.7/site-packages/mindspore/nn/cell.py", line 588, in compile_and_run
    self.compile(*inputs)
  File "/home/marcel/Mindspore/p3venv/lib/python3.7/site-packages/mindspore/nn/cell.py", line 575, in compile
    _executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
  File "/home/marcel/Mindspore/p3venv/lib/python3.7/site-packages/mindspore/common/api.py", line 502, in compile
    result = self._executor.compile(obj, args_list, phase, use_vm)
RuntimeError: mindspore/ccsrc/backend/kernel_compiler/gpu/nn/softmax_gpu_kernel.h:222 InitSizeByAxisLastDim] Input is 3-D, but axis(1) is invalid.

Thanks a lot for helping us!

Vincent34 commented 3 years ago

This seems to be an internal error about gpu kernel softmax. As far as I know, the GPU Softmax operator with axis other than -1 is under developing.

You may apply Transpose before Softmax in the network like this:

self.perm = (0, 2, 1)
self.transpose = P.Transpose()
self.softmax = P.Softmax(axis=-1)
...
...
  x_transpose = self.transpose(x, perm)
  y_transpose = self.softmax(x_transpose )
  y = self.transpose(y_transpose)

As for this situation, I think this softmax op may be the activation in BertSquadModel: https://github.com/mindspore-ai/mindspore/blob/aeaf2d14b35345d79df4fe92ca7d76e79457d5a5/model_zoo/official/nlp/bert/src/finetune_eval_model.py#L74

marwage commented 3 years ago

We modified the code according to your template. It looks like this now: https://github.com/kungfu-ml/mindspore-bert/commit/c35d3327d7f92ce194c9849fd76a1202b30651ce

The error changed slightly.

RuntimeError: mindspore/ccsrc/backend/kernel_compiler/gpu/nn/softmax_grad_gpu_kernel.h:128 Init] Input is 3-D, but softmax grad only supports 2-D inputs.

Vincent34 commented 3 years ago

:joy:

Then maybe you have to reshape the 3-D input into 2-D input using Reshape or Squeeze and ExpandDims, for example:

self.perm = (0, 2, 1)
self.transpose = P.Transpose()
self.softmax = P.Softmax(axis=-1)
...
...
        x_transpose = self.transpose(x, self.perm)
        x_shape = F.shape(x_transpose)
        x_reshape = F.reshape(x_transpose, (x_shape[0] * x_shape[1], -1))
        y_reshape = self.softmax(x_reshape)
        y_transpose = F.reshape(y_reshape, x_shape)
        y = self.transpose(y_transpose, self.perm)

marwage commented 3 years ago

We get the same error when doing the transpose. Note though, that BERT Squad uses log_softmax instead if softmax. Maybe there is an implementation difference. Is there any way that you fix the BertSquadCell and send us the code that is working?

CaitinZhao commented 3 years ago

class BertSquadModel(nn.Cell):
    '''
    This class is responsible for SQuAD
    '''
    def __init__(self, config, is_training, num_labels=2, dropout_prob=0.0, use_one_hot_embeddings=False):
        super(BertSquadModel, self).__init__()
        if not is_training:
            config.hidden_dropout_prob = 0.0
            config.hidden_probs_dropout_prob = 0.0
        self.bert = BertModel(config, is_training, use_one_hot_embeddings)
        self.weight_init = TruncatedNormal(config.initializer_range)
        self.dense1 = nn.Dense(config.hidden_size, num_labels, weight_init=self.weight_init,
                               has_bias=True).to_float(config.compute_type)
        self.num_labels = num_labels
        self.dtype = config.dtype
        self.log_softmax = P.LogSoftmax(axis=-1)
        self.is_training = is_training

    def construct(self, input_ids, input_mask, token_type_id):
        sequence_output, _, _ = self.bert(input_ids, token_type_id, input_mask)
        batch_size, seq_length, hidden_size = P.Shape()(sequence_output)
        sequence = P.Reshape()(sequence_output, (-1, hidden_size))
        logits = self.dense1(sequence)
        logits = P.Cast()(logits, self.dtype)
        logits = P.Reshape()(logits, (batch_size, seq_length, self.num_labels))
        logits = P.Transpose()(logits, (0, 2, 1))
        logits = self.log_softmax(logits)
        logits = P.Transpose()(logits, (0, 2, 1))
        return logits

I tried this BertSquadModel on MindSpore 1.2, and got loss sucessfully.

marwage commented 3 years ago

We get the same error with Mindspore v1.2.0. We needed to modify the file bert_for_finetune.py so that it runs with a GPU. Did you use a GPU? Is it possible that you provide us with the data needed in the run_squad.sh script? Therefore, excluding the possible issue of having the wrong files.

Vincent34 commented 3 years ago

We use the data same as https://deepai.org/dataset/squad1-1-dev

$ md5sum *
3e85deb501d4e538b6bc56f786231552 *dev-v1.1.json
981b29407e0affa3b1b156f72073b945 *train-v1.1.json

@CaitinZhao Could you please provide a fixed branch ready to finetune with GPU?

marwage commented 3 years ago

@Vincent34 and you are using https://github.com/google-research/bert/blob/eedf5716ce1268e56f0a50264a88cafad334ac61/run_squad.py to create the tfrecord file from it? If so then, we use the same data. Additionally, it would mean that we are out of ideas what the issue could be.

@CaitinZhao if you could do that would be very nice

CaitinZhao commented 3 years ago

I push my code to gitee, you can download it and have a try. https://gitee.com/zhao_ting_v/mindspore/tree/bert/ @marwage

Vincent34 commented 3 years ago

I push my code to gitee, you can download it and have a try. https://gitee.com/zhao_ting_v/mindspore/tree/bert/ @marwage

You can check this for the modification. https://gitee.com/zhao_ting_v/mindspore/commit/d17b439b2e3dd33c262b5e4fea3ee0131b0163b5

@Vincent34 and you are using https://github.com/google-research/bert/blob/eedf5716ce1268e56f0a50264a88cafad334ac61/run_squad.py to create the tfrecord file from it? If so then, we use the same data. Additionally, it would mean that we are out of ideas what the issue could be.

Yes, I use run_squad.py from google to create the tfrecord.

marwage commented 3 years ago

Good news! It is working for us. It's not working with version 1.1.0 but with version 1.2.0. We will try to build our project against version 1.2.0. Thank you for your help!

One outstanding issue though is that the pre-trained model is in Chinese. Do you have a pre-trained model in English?

Update Substituting with your finetune_eval_model.py let's us run the fine-tuning with version 1.1.0.

Vincent34 commented 3 years ago

Sorry we haven't provide a pre-trained model in English yet.

But I have a checkpoint convertor myself, which could be used to tranfer weights from a google pre-trained model to a MindSpore one. Can this help you?

https://gist.github.com/Vincent34/b1300463453d7433f1dfe9494d5cdf7e

marwage commented 3 years ago

That would work just fine. ms2tf_config.py is basically just a dictionary. Do you have the corresponding script that translates the tf model to a ms model?

Vincent34 commented 3 years ago

That would work just fine. ms2tf_config.py is basically just a dictionary. Do you have the corresponding script that translates the tf model to a ms model?

You can just pass the argument transfer_option with value tf2ms while running ms_and_tf_checkpoint_transfer_tools.py.

marwage commented 3 years ago

Where can we find the ms_and_tf_checkpoint_transfer_tools.py script? It does not seem to be part of Mindspore's repository.

Vincent34 commented 3 years ago

Where can we find the ms_and_tf_checkpoint_transfer_tools.py script? It does not seem to be part of Mindspore's repository.

It's part of my gist. There are two scripts in

https://gist.github.com/Vincent34/b1300463453d7433f1dfe9494d5cdf7e#file-ms_and_tf_checkpoint_transfer_tools-py

marwage commented 3 years ago

Oh, sorry! I did not see the second file...

Unfortunately, the checkpoint from https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12.tar.gz does not work with your script. The exact error is:


Traceback (most recent call last):  File "ms_and_tf_checkpoint_transfer_tools.py", line 129, in <module>
    main()
  File "ms_and_tf_checkpoint_transfer_tools.py", line 123, in main    convert_tf_2_ms(tf_ckpt_path, ms_ckpt_path, new_ckpt_path)
  File "ms_and_tf_checkpoint_transfer_tools.py", line 76, in convert_tf_2_ms
    data = tf.train.load_variable(tf_ckpt_path, tf_name)
  File "/home/marcel/Mindspore/kf-ms-venv/lib/python3.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 85, in load_variable
    reader = load_checkpoint(ckpt_dir_or_file)
  File "/home/marcel/Mindspore/kf-ms-venv/lib/python3.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 67, in load_checkpoint
    "given directory %s" % ckpt_dir_or_file)
ValueError: Couldn't find 'checkpoint' file or checkpoints in given directory /home/marcel/Mindspore/convert_model/uncased_L-24_H-1024_A-16

We looked at the Tensorflow BERT code. We suspect that we need to do something like


with strategy.scope():
    # Prediction always uses float32, even if training uses mixed precision.
    tf.keras.mixed_precision.set_global_policy('float32')
    squad_model, _ = bert_models.squad_model(
    ¦   bert_config,
    ¦   input_meta_data['max_seq_length'],
    ¦   hub_module_url=FLAGS.hub_module_url)

  if checkpoint_path is None:
    checkpoint_path = tf.train.latest_checkpoint(FLAGS.model_dir)
  logging.info('Restoring checkpoints from %s', checkpoint_path)
  checkpoint = tf.train.Checkpoint(model=squad_model)
  checkpoint.restore(checkpoint_path).expect_partial()
  return squad_model

https://github.com/tensorflow/models/blob/1fa648a753b877f18ca3a1de9bb921c3f024c11d/official/nlp/bert/run_squad_helper.py

We cannot make it work though. Could you have a look?

Vincent34 commented 3 years ago

It should work like this:

python ms_and_tf_checkpoint_transfer_tools.py \
    --tf_ckpt_path=uncased_L-24_H-1024_A-16/bert_model.ckpt \
    --ms_ckpt_path=bert_ms.ckpt
    --new_ckpt_path=bert_new.ckpt \
    --transfer_option=tf2ms

I guess you just passed the directory path, without the checkpoint name. The tf checkpoint should contains 3 files including:

bert_model.ckpt.index
bert_model.ckpt.data-00000-of-00001
bert_model.ckpt.meta

whose name is bert_model.ckpt over the different extral suffixes.

marwage commented 3 years ago

Thank you!

Now the error is

Traceback (most recent call last):
  File "ms_and_tf_checkpoint_transfer_tools.py", line 129, in <module>
    main()
  File "ms_and_tf_checkpoint_transfer_tools.py", line 123, in main    convert_tf_2_ms(tf_ckpt_path, ms_ckpt_path, new_ckpt_path)
  File "ms_and_tf_checkpoint_transfer_tools.py", line 77, in convert_tf_2_ms    ms_shape = ms_param_dict[ms_name].data.shape
KeyError: 'bert.bert.bert_embedding_postprocessor.token_type_embedding.embedding_table'

Is the following Mindspore model wrong? https://download.mindspore.cn/model_zoo/r1.1/bertbase_ascend_v111_zhwiki_offical_nlp_bs256_loss3/bertbase_ascend_v111_zhwiki_offical_nlp_bs256_loss3.7.ckpt

Vincent34 commented 3 years ago

My script is used to convert the checkpoint of version v1.2, of which the name of weights has little difference with v1.1. We switched to the builtin Embedding since version 1.2. So the weight name of embedding has changed.

1.1 vs. 1.2

"bert.bert.bert_embedding_postprocessor.embedding_table": "bert.bert.bert_embedding_postprocessor.token_type_embedding.embedding_table
"bert.bert.bert_embedding_postprocessor.full_position_embeddings": "bert.bert.bert_embedding_postprocessor.full_position_embedding.embedding_table"

You could modify the name in ms2tf_config.py directly, or just run the pretraining job to get a checkpoint from version 1.2. The saved weight doesn't matter, and will be overwrite soon.

marwage commented 3 years ago

Thanks! The conversion worked after also deleting the layers in the dictionary that are not part of BERT-base.

Unfortunately, it seems like there needs happen a transpose at some point.

RuntimeError: Net parameters bert.bert.bert_embedding_lookup.embedding_table shape((30522, 768)) different from parameter_dict's((768, 30522))

Vincent34 commented 3 years ago

That's weried.

Maybe you can fixed it by process a transpose directly after load_checkpoint in run_squad.py. And you can use save_checkpoint to save a fixed checkpoint.

Vincent34 commented 3 years ago

I have figure it out.

Did you just modify the weight name in ms2tf_config.py, and still passed the pretrained checkpoint from mindspore for zh-wiki as the ms_ckpt_path?

The embedding shape for zh and en is different, because the vocab_size of zh is 21128, but the vocab_size of en is 30522. That makes the following shape check failed, so there is a superfluous transpose. https://gist.github.com/Vincent34/b1300463453d7433f1dfe9494d5cdf7e#file-ms_and_tf_checkpoint_transfer_tools-py-L81

Maybe you can solve this by add more conditions for that shape check.

marwage commented 3 years ago

Thank you! You gave us the right hint. We were using a pre-trained Mindspore model. The trick was to adjust the seq_length in the file src/config.py

Thank you for all your help! We can close this issue now :)

amanwalia123 commented 3 years ago

Hi @marwage , I am working on BERT for Mindspore as well and looking for your findings. Are you able to achieve the accuracy for SQUAD1.1 as mentioned in paper for BERT. Do you think you can share your code?

marwage commented 3 years ago

HI @amanwalia123 , you can find our code at https://github.com/kungfu-ml/mindspore-bert . So far, we were running the experiments only for one epoch. That's why I cannot say something about the accuracy.

amanwalia123 commented 3 years ago

That's really helpful that you shared the code. I really appreciate this. If possible, can you share you findings once it is finished?

mindspore-ai / mindspore

Pre-trained model for BERT fine-tuning with Squad v1.1 #143