Finetune Albert on MovieReview dataset

PhilippMarquardt commented 4 years ago

Hi! I tried finetuning the albert base/large model on the MovieReview dataset that is used in the bert example.
The model is created like this: def create_model(max_seq_len):

albert_model_name = "albert_base"
albert_dir = bert.fetch_tfhub_albert_model(albert_model_name, ".models")
model_params = bert.albert_params(albert_dir)
l_bert = bert.BertModelLayer.from_params(model_params, name="albert")    
input_ids      = keras.layers.Input(shape=(max_seq_len,), dtype='int32', name="input_ids")
#token_type_ids = keras.layers.Input(shape=(max_seq_len,), dtype='int32', name="token_type_ids")
#output         = l_bert([input_ids, token_type_ids])
output         = l_bert(input_ids)

print("bert shape", output.shape)
cls_out = keras.layers.Lambda(lambda seq: seq[:, 0, :])(output)
cls_out = keras.layers.Dropout(0.5)(cls_out)
logits = keras.layers.Dense(units=1024, activation="tanh")(cls_out)
logits = keras.layers.Dropout(0.5)(logits)
logits = keras.layers.Dense(units=2, activation="softmax")(logits)

# model = keras.Model(inputs=[input_ids, token_type_ids], outputs=logits)
# model.build(input_shape=[(None, max_seq_len), (None, max_seq_len)])
model = keras.Model(inputs=input_ids, outputs=logits)
model.build(input_shape=(None, max_seq_len))
model.compile(optimizer=keras.optimizers.Adam(),
            loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
            metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")])

# load the pre-trained model weights
bert.load_albert_weights(l_bert, albert_dir)

model.summary()

return model

I've used from bert.tokenization.albert_tokenization import FullTokenizer for tokenization

Everything else is like in the provided bert example.

When executing the training loop, the accuracy stays at 50% and the loss doesn't really change from .7

Has anybody successfully finetuned any pretrained albert model on the MovieReview dataset? If yes, what am I doing wrong? Thanks in advance!

kpe commented 4 years ago

@PhilippMarquardt - it might be, that your learning rate is too high. Try different learning rates, i.e. somewhere around 1e-5, or check the examples under https://github.com/kpe/bert-for-tf2/tree/master/examples

raviolli commented 4 years ago

Any luck with this. I set the LR to 5e-5 but nothing... same problem

eschibli commented 4 years ago

I have the same issue with outputs never evolving away from 0.5 and loss remaining at 0.7 regardless of learning rate or optimizer. Could you confirm that it is working and post an example if possible?

Ritika2001 commented 4 years ago

I am having the same issue. The model is predicting everything as 1 class.

kpe / bert-for-tf2

Finetune Albert on MovieReview dataset #53