JetRunner commented 4 years ago

🐛 Bug

Information

Model I am using (Bert, XLNet ...): CodeBERT

Language I am using the model on (English, Chinese ...): Code

The problem arises when using:

[x] the official example scripts: (give details below)
[ ] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[x] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

This is the right code and right outputs:

from transformers import RobertaConfig, RobertaTokenizer, RobertaForMaskedLM, pipeline

model = RobertaForMaskedLM.from_pretrained('microsoft/codebert-base-mlm')
tokenizer = RobertaTokenizer.from_pretrained('microsoft/codebert-base-mlm')

CODE = "if (x is not None) <mask> (x>1)"
fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)

outputs = fill_mask(CODE)
print(outputs)

Output:

[{'sequence': '<s>if (x is not None) and(x>1)</s>', 'score': 0.7236990928649902, 'token': 8, 'token_str': 'Ġand'}, {'sequence': '<s>if (x is not None) &(x>1)</s>', 'score': 0.10633797943592072, 'token': 359, 'token_str': 'Ġ&'}, {'sequence': '<s>if (x is not None)and(x>1)</s>', 'score': 0.021604137495160103, 'token': 463, 'token_str': 'and'}, {'sequence': '<s>if (x is not None) AND(x>1)</s>', 'score': 0.02122747339308262, 'token': 4248, 'token_str': 'ĠAND'}, {'sequence': '<s>if (x is not None) if(x>1)</s>', 'score': 0.016991324722766876, 'token': 114, 'token_str': 'Ġif'}]

But if we load the model with RobertaModel and proceed with the same pipeline:

from transformers import RobertaConfig, RobertaTokenizer, RobertaModel, pipeline

model = RobertaModel.from_pretrained('microsoft/codebert-base-mlm')
tokenizer = RobertaTokenizer.from_pretrained('microsoft/codebert-base-mlm')

CODE = "if (x is not None) <mask> (x>1)"
fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)

outputs = fill_mask(CODE)
print(outputs)

Then the output makes no sense at all:

[{'sequence': '<s>if (x is not None) real(x>1)</s>', 'score': 0.9961338043212891, 'token': 588, 'token_str': 'Ġreal'}, {'sequence': '<s>if (x is not None)n(x>1)</s>', 'score': 1.70519979292294e-05, 'token': 282, 'token_str': 'n'}, {'sequence': '<s>if (x is not None) security(x>1)</s>', 'score': 1.5919968063826673e-05, 'token': 573, 'token_str': 'Ġsecurity'}, {'sequence': '<s>if (x is not None) Saturday(x>1)</s>', 'score': 1.5472969607799314e-05, 'token': 378, 'token_str': 'ĠSaturday'}, {'sequence': '<s>if (x is not None) here(x>1)</s>', 'score': 1.543204598419834e-05, 'token': 259, 'token_str': 'Ġhere'}]

transformers version: 3.0.1
Platform: Colab
Python version: Doesn't matter
PyTorch version (GPU?): Doesn't matter
Tensorflow version (GPU?): Doesn't matter
Using GPU in script?: Doesn't matter
Using distributed or parallel set-up in script?: Doesn't matter

JetRunner commented 4 years ago

I'm working on a fix now.

ashutosh-dwivedi-e3502 commented 4 years ago

This bug occurs irrespective transformer version I checked it for 2.8.0, 2.90 and 3.0.1

Pipeline returns incorrect output only when the model and tokenizer classes are used to initialize the pipeline.

If you use model and tokernizer parameters as path instead in form of string. The output is fine. Following snippet demonstrates this :

from transformers import RobertaModel, RobertaTokenizer, RobertaConfig
from transformers import pipeline

MODEL_PATH =  'roberta-base'

model = RobertaModel.from_pretrained(MODEL_PATH)
tokenizer = RobertaTokenizer.from_pretrained(MODEL_PATH)

fill_from_path = pipeline(
    'fill-mask',
    model=MODEL_PATH,
    tokenizer=MODEL_PATH
)

fill_from_model = pipeline(
    'fill-mask',
    model=model,
    tokenizer=tokenizer
)
seq = 'I found a bug in <mask>'
print(fill_from_path(seq))
print(fill_from_model(seq))

The output is the following. You can see the first output is fine where we used the model paths, but the second output where we provided the model and tokenizer classes has a problem.

[{'sequence': '<s> I found a bug in Firefox</s>', 'score': 0.051126863807439804, 'token': 30675}, {'sequence': '<s> I found a bug in Gmail</s>', 'score': 0.027283240109682083, 'token': 29004}, {'sequence': '<s> I found a bug in Photoshop</s>', 'score': 0.024683473631739616, 'token': 35197}, {'sequence': '<s> I found a bug in Java</s>', 'score': 0.021543316543102264, 'token': 24549}, {'sequence': '<s> I found a bug in Windows</s>', 'score': 0.018485287204384804, 'token': 6039}]
[{'sequence': '<s> I found a bug in real</s>', 'score': 0.9705745577812195, 'token': 588}, {'sequence': '<s> I found a bug in here</s>', 'score': 0.00013350950030144304, 'token': 259}, {'sequence': '<s> I found a bug in within</s>', 'score': 6.807789031881839e-05, 'token': 624}, {'sequence': '<s> I found a bug in San</s>', 'score': 6.468965875683352e-05, 'token': 764}, {'sequence': '<s> I found a bug in 2015</s>', 'score': 6.282260437728837e-05, 'token': 570}]

julien-c commented 4 years ago

@ashutosh-dwivedi-e3502 Try changing this line model = RobertaModel.from_pretrained(MODEL_PATH) into model = AutoModelForMaskedLM.from_pretrained(MODEL_PATH)

ashutosh-dwivedi-e3502 commented 4 years ago

@JuhaKiili That fixes it. Output with model = AutoModelForMaskedLM.from_pretrained(MODEL_PATH) is :

Some weights of RobertaForMaskedLM were not initialized from the model checkpoint at roberta-base and are newly initialized: ['lm_head.decoder.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/Users/asdwivedi/.virtualenvs/test-demo-TklxO9OB/lib/python3.8/site-packages/transformers/modeling_auto.py:796: FutureWarning: The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.
  warnings.warn(
Some weights of RobertaForMaskedLM were not initialized from the model checkpoint at roberta-base and are newly initialized: ['lm_head.decoder.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[{'sequence': '<s>I found a bug in Firefox</s>', 'score': 0.05709619075059891, 'token': 30675, 'token_str': 'ĠFirefox'}, {'sequence': '<s>I found a bug in Gmail</s>', 'score': 0.03430333733558655, 'token': 29004, 'token_str': 'ĠGmail'}, {'sequence': '<s>I found a bug in WordPress</s>', 'score': 0.028388172388076782, 'token': 33398, 'token_str': 'ĠWordPress'}, {'sequence': '<s>I found a bug in Java</s>', 'score': 0.02571324072778225, 'token': 24549, 'token_str': 'ĠJava'}, {'sequence': '<s>I found a bug in Python</s>', 'score': 0.01953786611557007, 'token': 31886, 'token_str': 'ĠPython'}]
[{'sequence': '<s>I found a bug in Firefox</s>', 'score': 0.05709619075059891, 'token': 30675, 'token_str': 'ĠFirefox'}, {'sequence': '<s>I found a bug in Gmail</s>', 'score': 0.03430333733558655, 'token': 29004, 'token_str': 'ĠGmail'}, {'sequence': '<s>I found a bug in WordPress</s>', 'score': 0.028388172388076782, 'token': 33398, 'token_str': 'ĠWordPress'}, {'sequence': '<s>I found a bug in Java</s>', 'score': 0.02571324072778225, 'token': 24549, 'token_str': 'ĠJava'}, {'sequence': '<s>I found a bug in Python</s>', 'score': 0.01953786611557007, 'token': 31886, 'token_str': 'ĠPython'}]

huggingface / transformers

Weird output when using unexpected model type for pipelines #5678

🐛 Bug

Information

To reproduce