kpe / bert-for-tf2

A Keras TensorFlow 2.0 implementation of BERT, ALBERT and adapter-BERT.
https://github.com/kpe/bert-for-tf2
MIT License
803 stars 193 forks source link

Can not load pretrained bert weights when loading chinese_L-12_H-768_A-12/bert_model.ckpt #80

Closed yangxudong closed 3 years ago

yangxudong commented 3 years ago

Here is my code snippets

max_seq_len = 128
bert_params = bert.params_from_pretrained_ckpt(model_dir)
bert_layer = bert.BertModelLayer.from_params(bert_params, name="bert")

input_ids = Input(shape=(max_seq_len,), dtype='int32')
masked = Masking(mask_value=0)(input_ids)
emb = bert_layer.embeddings_layer(masked)  # shape: (None, seq_len, emb_size)
...
mask_ids = np.expand_dims(np.tile(np.array(tokenizer.convert_tokens_to_ids(["[MASK]"])), max_seq_len), 0)
emb_mask = bert_layer.embeddings_layer(mask_ids)  # shape(1, seq_len, emb_size)
new_emb = err_prob * emb_mask + (1. - err_prob) * emb  # broadcast, shape(None, seq_len, emb_size)
output = bert_layer.encoders_layer(new_emb)  # bert_layer 接受的是input_ids, 不是embedding之后的数据
output = Dense(num_classes, activation='softmax')(output + emb)
correct_model = Model(input_ids, output)
correct_model.build(input_shape=(None, max_seq_len))
bert.load_bert_weights(bert_layer, model_ckpt)
correct_model.compile(optimizer=Adam(1e-3))
correct_model.summary()

When I run it, get the problem of loading pretrained weights. Can anyone help me? Thanks!

Traceback (most recent call last):
  File "/Users/weisu.yxd/PycharmProjects/PY3/soft_mask_bert.py", line 103, in <module>
    bert.load_bert_weights(bert_layer, model_ckpt)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/bert/loader.py", line 206, in load_stock_weights
    prefix = bert_prefix(bert)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/bert/loader.py", line 186, in bert_prefix
    assert match, "Unexpected bert layer: {} weight:{}".format(bert, bert.weights[0].name)
AssertionError: Unexpected bert layer: <bert.model.BertModelLayer object at 0x102d56be0> weight:embeddings/word_embeddings/embeddings:0
kpe commented 3 years ago

I'm not able to reproduce, can you try posting a minimal but complete executable example, i.e. something like:

import os
import bert

from tensorflow import keras

model_name = "chinese_L-12_H-768_A-12"
model_dir = bert.fetch_google_bert_model(model_name, ".models")
model_ckpt = os.path.join(model_dir, "bert_model.ckpt")

bert_params = bert.params_from_pretrained_ckpt(model_dir)
l_bert = bert.BertModelLayer.from_params(bert_params, name="bert")

# use in Keras Model here, and call model.build()
model = keras.models.Sequential([
    keras.layers.InputLayer(input_shape=(128,)),
    l_bert,
    keras.layers.Lambda(lambda x: x[:, 0, :]),
    keras.layers.Dense(2)
])
model.build(input_shape=(None, 128))

bert.load_bert_weights(l_bert, model_ckpt)
model.summary()
kpe commented 3 years ago

ohh... I see, you are trying to replace/extend the default embeddings layer, cool!

I believe, because the l_bert instance is not part of the graph, actually because, the weights get instantiated here (out of any context/scope):

output = bert_layer.encoders_layer(new_emb)  

the prefix/name_scope is missing. As a workaround, you could put the relevant peaces (or everything) in a name_scope like this:

from tensorflow.python.keras import backend as K

# https://github.com/tensorflow/tensorflow/issues/27298
with K.get_graph().as_default(), K.name_scope('bert'):
  emb_mask = bert_layer.embeddings_layer(mask_ids)  # shape(1, seq_len, emb_size)
  output = bert_layer.encoders_layer(new_emb)

as a minimal example:

import os
import bert

from tensorflow import keras
from tensorflow.python.keras import backend as K

model_name = "chinese_L-12_H-768_A-12"
model_dir = bert.fetch_google_bert_model(model_name, ".models")
model_ckpt = os.path.join(model_dir, "bert_model.ckpt")

bert_params = bert.params_from_pretrained_ckpt(model_dir)

# https://github.com/tensorflow/tensorflow/issues/27298
with K.get_graph().as_default(), K.name_scope('bert'):
    l_bert = bert.BertModelLayer.from_params(bert_params, name="bert")
    inp_ids = keras.layers.Input(shape=(128,), dtype='int32')
    new_emb = l_bert.embeddings_layer(inp_ids)
    output = l_bert.encoders_layer(new_emb)
    output = keras.layers.Dense(3, activation='softmax')(output + new_emb)
    model = keras.models.Model(inp_ids, output, name='bert')

bert.load_bert_weights(l_bert, model_ckpt)
model.summary()

as an alternative consider extending BertModelLayer, and overriding the relevant methods (i.e. call(),build(),...).

yangxudong commented 3 years ago

the prefix/name_scope is missing

Thanks for your reply. And yes, it is because of missing the prefix/name_scope. Your example is feasible. Cool!