Model predictions wrong

brunopistone commented 3 years ago

Environment info

transformers version: 3.5.0
Platform: Linux
Python version: 3.7
PyTorch version (GPU?):
Tensorflow version (GPU?): 2.3.1
Using GPU in script?: Yes
Using distributed or parallel set-up in script?:

Who can help

Information

Model I am using (Bert, XLNet ...): Bert -> bert-base-uncased

The problem arises when using:

[ ] the official example scripts: (give details below)
[x] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[x] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Hi @LysandreJik , @sgugger , @jplu , I wan running my own script on a custom dataset by using "bert-base-uncased". It's a simple classification task with two classes. Below some examples:

"is_offensive", "text"
"1", "Your service is a shit."
"0", "Really great examples. Thank you for your help @exemple01"

This is the definition of the model:

from transformers import AutoConfig, BertTokenizer, TFAutoModel

config = AutoConfig.from_pretrained("bert-base-uncased")
config.output_hidden_states = output_hidden_states

model_bert = TFAutoModel.from_pretrained("bert-base-uncased", config=self.config)

model_bert = self.model.bert

input_ids_in = tf.keras.layers.Input(shape=(333,), name='input_token', dtype='int32')
input_masks_in = tf.keras.layers.Input(shape=(333,), name='masked_token', dtype='int32')

embeddings, main_layer = model_bert(input_ids_in, attention_mask=input_masks_in)

X = tf.keras.layers.Dropout(0.2)(main_layer)
X = tf.keras.layers.Dense(2, activation='softmax')(X)

loss_function = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model = tf.keras.Model(
        inputs=[input_ids_in, input_masks_in],
        outputs=[X]
)

for layer in model.layers[:3]:
        layer.trainable = False

model.compile(optimizer=tf.optimizers.Adam(lr=0.00001), loss=loss_function, metrics=['sparse_categorical_accuracy'])

history = model.fit(
            X_train,
            y_train,
            validation_split=0.2,
            epochs=10,
            batch_size=100
        )

I've trained the model for 5 epochs, these are results after the last epoch:

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_token (InputLayer)        [(None, 333)]        0                                            
__________________________________________________________________________________________________
masked_token (InputLayer)       [(None, 333)]        0                                            
__________________________________________________________________________________________________
bert (TFBertMainLayer)          ((None, 333, 768), ( 109482240   input_token[0][0]                
                                                                 masked_token[0][0]               
__________________________________________________________________________________________________
dropout_75 (Dropout)            (None, 768)          0           bert[0][1]                       
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 2)            1538        dropout_75[0][0]                 
==================================================================================================
Total params: 109,483,778
Trainable params: 1,538
Non-trainable params: 109,482,240
__________________________________________________________________________________________________
1475/1475 [==============================] - ETA: 0s - loss: 0.5041 - accuracy: 0.8028
Accuracy: 0.8027665019035339

Loss: 0.5041469931602478

Val Accuracy: 0.8009492754936218

Then I save the model in this way:

try:
        modelName = os.path.join(model_path, model_name)

        model_json = model.to_json()
        with open(modelName + ".json" "w") as json_file:
            json_file.write(model_json)
        json_file.close()

        model.save_weights(modelName + ".h5")

        logger.info("Saved {} to disk".format(modelName))
except Exception as e:
        stacktrace = traceback.format_exc()
        logger.error("{}".format(stacktrace))

        raise e

When I try to perform a prediction also on trained sentences, the model completely fails the goal. I think that is something wrong in the training results, I cannot have an ~81% of accuracy during the training and on validation, but when I validate the model on a completely new dataset I obtain an accuracy near to the 10%.

I decided to build my own model and I compared your framework with another one, that gives optimal results(near to the 85%).

Can you help me to understand the mistakes? Thank you.

LysandreJik commented 3 years ago

If you have an accuracy near 10% on a two-label sequence classification task - does that mean it gets 90% of the results wrong? If so, you might just have switched the labels.

brunopistone commented 3 years ago

Hi, no the problem is not related to what you said. I tried also to perform one hot encoding on the labels and change the loss function to "categorical_crossentropy" but the results are the same. I tried to use the official pre trained english model (https://github.com/google-research/bert) with another module and I don't have this problem (the keras model is the same).

jplu commented 3 years ago

Hello!

Can you try with TFBertForSequenceClassification?

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers