Open mawilson1234 opened 1 year ago
Hey! Did you find a solution/cause yet? I am experiencing the same issues on debertav3-base even though I pretrained the model on my own training data...
No dice, but I discovered the problem is worse than than just mask filling; it doesn't even produce the right thing for given tokens.
>>> import torch
>>> from transformers import AutoModelForMaskedLM, AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained('deberta-v3-base')
>>> model = AutoModelForMaskedLM.from_pretrained('deberta-v3-base')
>>> text = 'Do you [MASK] the muffin man?'
>>> inputs = tokenizer(text, return_tensors='pt')
# double checking
>>> tokenizer.batch_decode(inputs['input_ids'])
# all good
['Do you [MASK] the muffin man?']
>>> with torch.no_grad():
>>> outputs = model(**inputs)
>>> tokenizer.batch_decode(torch.argmax(outputs.logits, dim=-1))
# ???
['ût slimnatch Laughternatchilia Arrijailût']
I'd think it was something with the tokenizer, but for you saying you had the same issue with your pre-trained model. Do you know whether the same thing happens for all positions for your model?
Edit: Found #18674 that references this. Looks like it's been around for a while and it's being worked on.
Hey! I just came back from holidays, will have a look when I can, note that Deberta should be refactored soon, follow #22105 if you want to know more. This will be looked at when fixing!
Hope to get to this by the end of the summer!
I'm leaving this open to the community, did not have the bandwidth to adress it :(
System Info
Python version: 3.8.15 Transformers version: 4.24.0
Who can help?
@ArthurZucker, @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Both on the HF website and using transformers in Python scripts/interpreter, the DeBERTa models seem to produce nonsense outputs in a fill-mask task. This is demonstrated below using a fill-mask pipeline for ease of reproduction, but the same thing happens even when calling the models manually and inspecting the logits. I demonstrate with one model, but the other
microsoft/deberta
masked language models appear to have the same issue (i.e., not the ones fine-tuned on mnli or whatever, which I wouldn't test against).Here's a screenshot from the HF website for the same model (
microsoft/deberta-v3-large
):Based on the paper and the documentation on the model cards, it seems like these should be able to be used for masked language modeling out of the box since they were pre-trained on it, but they're clearly not doing a good job of it. Am I missing something about why these models shouldn't be used for MLM without fine-tuning, or is there a bug with them?
Expected behavior
I'd expect sensible predictions for masked token locations (assuming these models can indeed be used for that without additional fine-tuning).