kpe / bert-for-tf2

A Keras TensorFlow 2.0 implementation of BERT, ALBERT and adapter-BERT.
https://github.com/kpe/bert-for-tf2
MIT License
803 stars 193 forks source link

Error using Bert with GlobalAveragePooling1D: IndexError: list index out of range using GlobalAveragePooling1D #58

Closed nectario closed 4 years ago

nectario commented 4 years ago

I am trying to use Bert to encode chunks of text. I get the 3D output from Bert, and trying to apply GlobalAveragePooling1D. I get list index out of range. Here's a snippet:

` def bert_model(self, max_seq_len, number_of_labels=130, adapter_size=64, bert_config_file=None, bert_ckpt_file=None, bert_model_name=None):

    with tf.io.gfile.GFile(bert_config_file, "r") as reader:
        bc = StockBertConfig.from_json_string(reader.read())
        bert_params = map_stock_config_to_params(bc)
        bert_params.adapter_size = adapter_size
        bert = BertModelLayer.from_params(bert_params, name="bert")

    sentence_input = Input(shape=(max_seq_len,), dtype='float32', name="sentence_input_ids")
    subtitles_input = Input(shape=(self.max_shape[1], max_seq_len), dtype='int32', name="SubtitlesInput")

    bert_output = bert(sentence_input)
    bert_output = GlobalAveragePooling1D()(bert_output)

`

Traceback (most recent call last): File "C:/Development/Projects/GenrePrediction/GenrePredictionModel.py", line 765, in validation_labels=genre_prediction.validation_labels, epochs=1000, batch_size=1) File "C:/Development/Projects/GenrePrediction/GenrePredictionModel.py", line 468, in fit_bert self.sentence_model, self.model = self.bert_model(max_sentence_length, bert_ckpt_file="D:/Development/Projects/bert_models/"+self.bert_model_name+"/bert_model.ckpt", bert_config_file="D:/Development/Projects/bert_models/"+self.bert_model_name+"/bert_config.json", number_of_labels=len(self.genres)) File "C:/Development/Projects/GenrePrediction/GenrePredictionModel.py", line 336, in bert_model subtitles_timedistributed = segment_time_distributed(subtitles_input) File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 773, in call outputs = call_fn(cast_inputs, *args, **kwargs) File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\keras\layers\wrappers.py", line 270, in call output_shape = self.compute_output_shape(input_shape).as_list() File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\keras\layers\wrappers.py", line 212, in compute_output_shape child_output_shape = self.layer.compute_output_shape(child_input_shape) File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 768, in compute_output_shape layer_output_shapes = layer.compute_output_shape(layer_input_shapes) File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\keras\layers\pooling.py", line 591, in compute_output_shape return tensor_shape.TensorShape([input_shape[0], input_shape[2]]) IndexError: list index out of range

kpe commented 4 years ago

take a look at the stacktrace:

File "C:/Development/Projects/GenrePrediction/GenrePredictionModel.py", line 336, in bert_model
subtitles_timedistributed = segment_time_distributed(subtitles_input)

and see that the error is not related to the code you've posted :-)

nectario commented 4 years ago

This seems to be happening only with bert!

nectario commented 4 years ago

Here is my model function:

` def bert_model(max_seq_len, number_of_labels=130, adapter_size=64, bert_config_file=None, bert_ckpt_file=None, bert_model_name=None):

    with tf.io.gfile.GFile(bert_config_file, "r") as reader:
        bert_params = params_from_pretrained_ckpt(bert_ckpt_file)
        l_bert = BertModelLayer.from_params(bert_params, name="bert")

    sentence_input = Input(shape=(max_seq_len,), dtype='float32', name="sentence_input_ids")

    bert_output = l_bert(sentence_input)
    bert_output = GlobalAveragePooling1D()(bert_output)

    bert_sentence_model = Model(sentence_input, bert_output)

    segment_time_distributed = TimeDistributed(bert_sentence_model, name="TimeDistributedSegment")
    segment_cnn = Conv1D(256, 2, padding="same", strides=1, activation="relu", name="Segment2Conv1D")
    segment_max_pool_2 = MaxPooling1D(pool_size=3, name="Segment2MaxPool1D")

    subtitles_input = Input(shape=(max_shape[1], max_seq_len), dtype='int32', name="SubtitlesInput")

    subtitles_timedistributed = segment_time_distributed(subtitles_input)
    subtitles_cnn = segment_cnn(subtitles_timedistributed)
    subtitles_maxpool = segment_max_pool_2(subtitles_cnn)

    subtitles_dropout = SpatialDropout1D(0.30, name="SubtitlesDropout")(subtitles_maxpool)
    subtitles_pre_attention_output = Dense(256, name="SubtitlesPreAttnOutput")(subtitles_dropout)

    attention_subtitles = Attention(name="SubtitlesAttention")([subtitles_pre_attention_output, subtitles_maxpool])

    subtitles_max_output = GlobalMaxPool1D(name="GlobalMaxPoolSubitles")(attention_subtitles)
    subtitles_avg_output = GlobalAveragePooling1D(name="GlobalAvgPoolSubitles")(attention_subtitles)

    concat_output = Concatenate(axis=-1, name="OutputConcatenate")([subtitles_max_output, subtitles_avg_output])
    dropput = Dropout(0.40)(concat_output)
    output = Dense(number_of_labels, activation="sigmoid", name="Output")(dropput)

    model = Model(inputs=subtitles_input, outputs=output)
    model.compile(optimizer="adam",
                  loss="binary_crossentropy")

    bert_sentence_model.summary()
    model.summary()

    return bert_sentence_model, model`
nectario commented 4 years ago

When I don't use the bert layer, I do not get this error.

kpe commented 4 years ago

check your shapes, i.e. try:

print("bert_output", bert_output.shape)
bert_output = GlobalAveragePooling1D()(bert_output)
print("bert_output", bert_output.shape)

if you comment out the GlobalAveragePooling1D layer, the shape expected from your TimeDistributed() model will fit, and your code should come further.

I mean - the GlobalAveragePooling1D layer is reducing your "time" dimension, so the shape passed to TimeDistributed is a 2D tensor, while it should be at least a 3D tensor.

nectario commented 4 years ago

The input to the TimeDistributed is already a 3D input. I created two new simpler examples and both use GlobalAveragePooling like above. Then explain to me why one works, and the other doesn't? See below the simplified examples. First example uses an LSTM which is then passed to a GlobalAvgPooling layer. Second example uses bert.

`
//-----------------------WORKS-------------------------------------------// def example_1(bert_config_file=None, bert_ckpt_file=None): with tf.io.gfile.GFile(bert_config_file, "r") as reader: bert_params = params_from_pretrained_ckpt(bert_ckpt_file) l_bert = BertModelLayer.from_params(bert_params, name="bert")

in_sentence = Input(shape=(128,), dtype='int64', name="Input1")
embedded_sentence = Embedding(1000, 300, trainable=False)(in_sentence)
lstm_sentence = LSTM(300, return_sequences=True)(embedded_sentence)
bert_output = GlobalAveragePooling1D()(lstm_sentence)

sentence_model = Model(in_sentence, bert_output)

section_input = Input(shape=(None, None), dtype='int64', name="Input2")
section_encoded = TimeDistributed(sentence_model)(section_input)
section_encoded = LSTM(300)(section_encoded)
section_encoded = Dense(21)(section_encoded)
section_model = Model(section_input, section_encoded)

section_model.compile(optimizer="adam",
                   loss="binary_crossentropy")

sentence_model.summary()
section_model.summary()

return section_model

//-----------------------------DOES NOT WORK-------------------------------//

def example_2(bert_config_file=None, bert_ckpt_file=None): with tf.io.gfile.GFile(bert_config_file, "r") as reader: bert_params = params_from_pretrained_ckpt(bert_ckpt_file) l_bert = BertModelLayer.from_params(bert_params, name="bert")

in_sentence = Input(shape=(128,), dtype='int64', name="Input1")
bert_output = l_bert(in_sentence)
bert_output = GlobalAveragePooling1D()(bert_output)

sentence_model = Model(in_sentence, bert_output)

section_input = Input(shape=(None, None), dtype='int64', name="Input2")
section_encoded = TimeDistributed(sentence_model)(section_input)
section_encoded = LSTM(300)(section_encoded)
section_encoded = Dense(21)(section_encoded)
section_model = Model(section_input, section_encoded)

section_model.compile(optimizer="adam",
                   loss="binary_crossentropy")

sentence_model.summary()
section_model.summary()

return section_model`
kpe commented 4 years ago

have you tried instrumenting your code with the two print statements above? Do you have the output?

nectario commented 4 years ago

Yeap. See below both methods:

BERT bert_output (None, 128, 768) bert_output (None, 768)

LSTM (Passes) lstm_output (None, 128, 300) lstm_output (None, 300)

kpe commented 4 years ago

Ok, I see, there is indeed a problem in the BertModelLayer - compute_shape() is not implemented, therefore it returns the input shape, while it should be input_shape + [hidden_size]. Let me commit this now, and you should get a fix in few minutes! Thank you very much for finding this issue!

nectario commented 4 years ago

Perfect! Thank you!!

nectario commented 4 years ago

Great, now this error has disappeared! Now, after adding the below two last lines, I get this error:

   def bert_model(self,  bert_config_file=None, bert_ckpt_file=None):
           with tf.io.gfile.GFile(bert_config_file, "r") as reader:
                bert_params = params_from_pretrained_ckpt(bert_ckpt_file)
                l_bert = BertModelLayer.from_params(bert_params, name="bert")

                l_bert.apply_adapter_freeze()
                l_bert.embeddings_layer.trainable = False
.
.
.
.

Traceback (most recent call last): File "C:/Development/Projects/GenrePrediction/GenrePredictionModel.py", line 809, in validation_labels=genre_prediction.validation_labels, epochs=1000, batch_size=1) File "C:/Development/Projects/GenrePrediction/GenrePredictionModel.py", line 512, in fit_bert self.model = self.bert_2(bert_ckpt_file="D:/Development/Projects/bert_models/"+self.bert_model_name, bert_config_file="D:/Development/Projects/bert_models/"+self.bert_model_name+"/bert_config.json") File "C:/Development/Projects/GenrePrediction/GenrePredictionModel.py", line 345, in bert_2 l_bert.embeddings_layer.trainable = False AttributeError: 'NoneType' object has no attribute 'trainable'

kpe commented 4 years ago

could you check #29 - currently you can freeze the weights, only after the model has been built (actually in this case I can instantiate the inner layers in the constructor instead of in build()), so either wait few minutes until I trigger a new build, or put the trainable=False once the model is build/compiled.

kpe commented 4 years ago

ok, committed - thank you once again!

nectario commented 4 years ago

Awesome, thank you so much!