Pretrained models for masked LM do not work as expected

AnandA777 commented 2 years ago

I have been trying to use the pretrained DebertaV2ForMaskedLM based on the example code, but it is not working. The following BERT code (for which the example code looks basically identical) works as expected:

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForMaskedLM.from_pretrained("bert-base-uncased")
inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
outputs = model(**inputs, labels=labels)
word_indices = torch.argmax(outputs["logits"], dim=2)[0]
print(tokenizer.decode(word_indices)) # prints ". the capital of france is paris.."

However, substituting in the following options for the first two lines does not work:

tokenizer = DebertaV2Tokenizer.from_pretrained("microsoft/deberta-v2-xlarge")
model = DebertaV2ForMaskedLM.from_pretrained("microsoft/deberta-v2-xlarge")

tokenizer = DebertaTokenizer.from_pretrained("microsoft/deberta-base")
model = DebertaForMaskedLM.from_pretrained("microsoft/deberta-base")

Using either of these options results in nonsense output. Is the documentation missing something?

I am using:

Python 3.9.6 torch 1.9.0 transformers 4.12.5

maggiezha commented 2 years ago

Hello, have you solved this issue? Thanks! I also tried and "bert-base-uncased" works well, but once I changed to "microsoft/deberta-v2-xlarge" (same the example code in the doc: https://huggingface.co/docs/transformers/model_doc/deberta), I got an error: TypeError Traceback (most recent call last)

in () 6 model = DebertaV2ForMaskedLM.from_pretrained('microsoft/deberta-v2-xlarge') 7 ----> 8 inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt") 9 labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"] 10 TypeError: 'NoneType' object is not callable

maggiezha commented 2 years ago

What was the output did you get? I also tried 'microsoft/deberta-base', the TypeError is gone, but it says: Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaForMaskedLM: ['lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'lm_predictions.lm_head.dense.bias', 'deberta.embeddings.position_embeddings.weight', 'lm_predictions.lm_head.dense.weight']

This IS expected if you are initializing DebertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing DebertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of DebertaForMaskedLM were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

stefan-it commented 2 years ago

I've tried it using the pipeline feature from Transformers and output is not very good:

from transformers import pipeline
mask_pipeline = pipeline("fill-mask", model="microsoft/deberta-v2-xlarge", tokenizer="microsoft/deberta-v2-xlarge")

mask_pipeline("The capital of France is [MASK] .")

It outputs:

[{'score': 0.011109163984656334,
  'token': 125824,
  'token_str': '숨',
  'sequence': 'The capital of France is숨.'},
 {'score': 0.010662203654646873,
  'token': 123278,
  'token_str': '앞',
  'sequence': 'The capital of France is앞.'},
 {'score': 0.005203052423894405,
  'token': 127977,
  'token_str': '짧',
  'sequence': 'The capital of France is짧.'},
 {'score': 0.005113440100103617,
  'token': 110702,
  'token_str': 'leroy',
  'sequence': 'The capital of France is leroy.'},
 {'score': 0.004787191282957792,
  'token': 121305,
  'token_str': 'MVNO',
  'sequence': 'The capital of France is MVNO.'}]

stefan-it commented 2 years ago

Pinging @BigBird01 here, maybe there's something wrong with the MLM functionality itself :thinking:

AnandA777 commented 2 years ago

@maggiezha I don't have the exact output, but the highest scoring "words" were mostly symbols; nothing related to the input.

tqfang commented 2 years ago

Got the same issue when using HF DebertaForMaskedLM.

The thing is the checkpoint provided by microsoft/deberta-base and other similar ones don't have the pre-trained weights of the Masked LM head, which are the following parameters in DebertaForMaskedLM:

['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias']

That's why there are such warnings:

What was the output did you get? I also tried 'microsoft/deberta-base', the TypeError is gone, but it says: Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaForMaskedLM: ['lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'lm_predictions.lm_head.dense.bias', 'deberta.embeddings.position_embeddings.weight', 'lm_predictions.lm_head.dense.weight']

This IS expected if you are initializing DebertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

This IS NOT expected if you are initializing DebertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of DebertaForMaskedLM were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Hope that the authors can upload/ update those params soon :) @BigBird01

yardenTal1 commented 2 years ago

Hi, Hello, have somebody solved this issue? It will be very helpful.... Thanks!

yardenTal1 commented 2 years ago

Pinging @BigBird01 here, maybe there's something wrong with the MLM functionality itself....

y1ny commented 2 years ago

Got the same issue, it seems like some params of LM head are not uploaded.

microsoft / DeBERTa

Pretrained models for masked LM do not work as expected #74