kpe / bert-for-tf2

A Keras TensorFlow 2.0 implementation of BERT, ALBERT and adapter-BERT.
https://github.com/kpe/bert-for-tf2
MIT License
802 stars 193 forks source link

Can't train BERT with loaded weights on QA Task #86

Closed danielesposito27 closed 3 years ago

danielesposito27 commented 3 years ago

Hey everyone!. Thanks for this awesome repo

So I loaded some BERT weights that I obtained through customized BERT pretraining on a Spanish corpus. I loaded my weights as indicated by the team on previous issues:

params = bert.params_from_pretrained_ckpt('/content/drive/MyDrive/Proyectos IA/Vicky chatbot/entrenamiento_tf/berto/berto_checkpoint/berto_manuales')

berto = bert.BertModelLayer.from_params(params)

model = tf.keras.models.Sequential([berto])
model.build((3, 128))

bert.load_bert_weights(berto, '/content/drive/MyDrive/Proyectos IA/Vicky chatbot/entrenamiento_tf/berto/berto_checkpoint/berto_manuales/model.ckpt-20')

So after I load my weights I try to pass it input data that looks like this:

{'input_mask': <tf.Tensor: shape=(4, 384), dtype=int32, numpy=
 array([[1, 1, 1, ..., 0, 0, 0],
        [1, 1, 1, ..., 0, 0, 0],
        [1, 1, 1, ..., 0, 0, 0],
        [1, 1, 1, ..., 0, 0, 0]], dtype=int32)>,
 'input_type_ids': <tf.Tensor: shape=(4, 384), dtype=int32, numpy=
 array([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]], dtype=int32)>,
 'input_word_ids': <tf.Tensor: shape=(4, 384), dtype=int32, numpy=
 array([[    4,  1036,  4920, ...,     0,     0,     0],
        [    4,     3,  1013, ...,     0,     0,     0],
        [    4,     3, 12767, ...,     0,     0,     0],
        [    4,  1030,  1641, ...,     0,     0,     0]], dtype=int32)>}

As you know these are inputs from SQuAD dataset. Problem is that I get the following error msg;

Layer bert_model_layer_14 expects 1 input(s), but it received 3 input tensors. Inputs received: [<tf.Tensor: shape=(4, 384), dtype=int32, numpy=
array([[1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0]], dtype=int32)>, <tf.Tensor: shape=(4, 384), dtype=int32, numpy=
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int32)>, <tf.Tensor: shape=(4, 384), dtype=int32, numpy=
array([[    4,  1036,  4920, ...,     0,     0,     0],
       [    4,     3,  1013, ...,     0,     0,     0],
       [    4,     3, 12767, ...,     0,     0,     0],
       [    4,  1030,  1641, ...,     0,     0,     0]], dtype=int32)>]

By the way I don't get this error msg when I load a pretrained model from tensorflow hub but I need to use my own weights on this problem.

Thanks in advance!

kpe commented 3 years ago

I believe, by calling build() like this:

   model.build((3, 128))

you are effectively telling the BertModelLayer to expect a single tensor of token_ids (with batch_size=3) and use the default segment type 0:

https://github.com/kpe/bert-for-tf2/blob/55f6a6fd5d8ea14f96ee19938b7a1bf0cb26aaea/bert/model.py#L45-L54

If you want to specify segment ids, i.e. token_type_ids, try passing a list of shapes, i.e. something like:

l_input_ids           = keras.layers.Input(shape=(max_seq_len,), dtype='int32')
l_token_type_ids = keras.layers.Input(shape=(max_seq_len,), dtype='int32')

# provide a custom token_type/segment id as a layer input
output = l_bert([l_input_ids, l_token_type_ids])          # [batch_size, max_seq_len, hidden_size]
model = keras.Model(inputs=[l_input_ids, l_token_type_ids], outputs=output)
model.build(input_shape=[(None, max_seq_len), (None, max_seq_len)])
danielesposito27 commented 3 years ago

Hey KPE!. Thank you for your answer, I implemented the code to accept the additional inputs. Is there a way to make it accept input_mask as a third input as well? Or should I include that in another way? it seems like the base code would only allow up to two. Thanks in advance and I apologize for asking again, any help is kindly appreciated (beginner here)!

kpe commented 3 years ago

@danielesposito27 - the mask is kind of auto generated internally (i.e. all zero token_ids get masked), and I believe, there is no way to explicitly specify it (but it should not be needed - in most cases I guess).