Open AnandA777 opened 2 years ago
Hello, have you solved this issue? Thanks! I also tried and "bert-base-uncased" works well, but once I changed to "microsoft/deberta-v2-xlarge" (same the example code in the doc: https://huggingface.co/docs/transformers/model_doc/deberta), I got an error: TypeError Traceback (most recent call last)
What was the output did you get? I also tried 'microsoft/deberta-base', the TypeError is gone, but it says: Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaForMaskedLM: ['lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'lm_predictions.lm_head.dense.bias', 'deberta.embeddings.position_embeddings.weight', 'lm_predictions.lm_head.dense.weight']
I've tried it using the pipeline
feature from Transformers and output is not very good:
from transformers import pipeline
mask_pipeline = pipeline("fill-mask", model="microsoft/deberta-v2-xlarge", tokenizer="microsoft/deberta-v2-xlarge")
mask_pipeline("The capital of France is [MASK] .")
It outputs:
[{'score': 0.011109163984656334,
'token': 125824,
'token_str': '숨',
'sequence': 'The capital of France is숨.'},
{'score': 0.010662203654646873,
'token': 123278,
'token_str': '앞',
'sequence': 'The capital of France is앞.'},
{'score': 0.005203052423894405,
'token': 127977,
'token_str': '짧',
'sequence': 'The capital of France is짧.'},
{'score': 0.005113440100103617,
'token': 110702,
'token_str': 'leroy',
'sequence': 'The capital of France is leroy.'},
{'score': 0.004787191282957792,
'token': 121305,
'token_str': 'MVNO',
'sequence': 'The capital of France is MVNO.'}]
Pinging @BigBird01 here, maybe there's something wrong with the MLM functionality itself :thinking:
@maggiezha I don't have the exact output, but the highest scoring "words" were mostly symbols; nothing related to the input.
Got the same issue when using HF DebertaForMaskedLM.
The thing is the checkpoint provided by microsoft/deberta-base and other similar ones don't have the pre-trained weights of the Masked LM head, which are the following parameters in DebertaForMaskedLM:
['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias']
That's why there are such warnings:
What was the output did you get? I also tried 'microsoft/deberta-base', the TypeError is gone, but it says: Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaForMaskedLM: ['lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'lm_predictions.lm_head.dense.bias', 'deberta.embeddings.position_embeddings.weight', 'lm_predictions.lm_head.dense.weight']
- This IS expected if you are initializing DebertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of DebertaForMaskedLM were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Hope that the authors can upload/ update those params soon :) @BigBird01
Hi, Hello, have somebody solved this issue? It will be very helpful.... Thanks!
Pinging @BigBird01 here, maybe there's something wrong with the MLM functionality itself....
Got the same issue, it seems like some params of LM head are not uploaded.
I have been trying to use the pretrained
DebertaV2ForMaskedLM
based on the example code, but it is not working. The following BERT code (for which the example code looks basically identical) works as expected:However, substituting in the following options for the first two lines does not work:
Using either of these options results in nonsense output. Is the documentation missing something?
I am using:
Python 3.9.6 torch 1.9.0 transformers 4.12.5