microsoft / DeBERTa

The implementation of DeBERTa
MIT License
1.97k stars 224 forks source link

Training mDeBERTaV3 with Simple Transformers not successful: macro, micro f1: 0.003, 0.035 #84

Closed TajaKuzman closed 2 years ago

TajaKuzman commented 2 years ago

Hello,

I would like to fine-tune mDEBERTaV3 for the genre classification task, and compare it to XML-RoBERTa and some other similar models, but the training gives very low results (macro, micro f1: 0.003, 0.035; high running loss: 3.0456) and the confusion matrix shows that the model predicts one class to all instances (different class in different runs). Training other models (XML-RoBERTa, SloBERTa, BERTić etc.) with the same setting (only the model type and model name are changed for each model, otherwise the code and dataset is the same) works without any problems.

Here are the hyperparameters:

from simpletransformers.classification import ClassificationModel

model_args ={"overwrite_output_dir": True,
             "num_train_epochs": 90,
             "labels_list": LABELS,
             "learning_rate": 1e-5,
             "train_batch_size": 32,
             "no_cache": True,
             "no_save": True,
             "max_seq_length": 300,
             "save_steps": -1
             }

debertav3_model = ClassificationModel(
        "debertav2", "microsoft/mdeberta-v3-base",
        num_labels=21,
        use_cuda=True,
        args=model_args
    )

The training is performed without being stopped by an error, but there occur some warning messages that might have something to do with the low performance:

  1. When loading the pre-trained model:
    Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

This makes me think that it might be the problem with the model type, but changing it to "deberta-v2", "debertav3", or "deberta" results in an error.

  1. When training:
    /opt/conda/lib/python3.7/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:1313: UserWarning: This overload of nonzero is deprecated:
    nonzero()
    Consider using one of the following signatures instead:
    nonzero(*, bool as_tuple) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
    label_index = (labels >= 0).nonzero()
    /opt/conda/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
    "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)

I'm working on Kaggle, and using the following versions: pytorch>=1.6, cudatoolkit=11.0, simpletransformers==0.63.3, torch==1.6.0+cu101

Thank you very much in advance for your help!

TajaKuzman commented 2 years ago

Turns out the reason for not working is probably the fact that the deberta-v2 is not supported as a text classification model type in Simple Transformers yet.

danil31219as commented 2 years ago

set fp16=False

TajaKuzman commented 2 years ago

Thank you, it now works!