huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.67k stars 26.7k forks source link

SqueezeBert does not appear to properly generate text #8277

Closed huu4ontocord closed 3 years ago

huu4ontocord commented 3 years ago

Environment info

Google Colab Using CPU with High Ram

Who can help

@sgugger @forresti @LysandreJik

Information

Model I am using: Squeezebert-uncased, squeezebert-mnli, etc.

The problem arises when using:

Trying to generate the likely output of the input sequence and predicting masked tokens.

To reproduce


from torch import nn
from transformers import AutoModelForMaskedLM, AutoTokenizer
model = AutoModelForMaskedLM.from_pretrained('squeezebert/squeezebert-mnli')
tokenizer = AutoTokenizer.from_pretrained('squeezebert/squeezebert-mnli')
#model.tie_weights()
input_txt = ["[MASK] was an American [MASK]  and lawyer who served as the 16th president  of the United States from 1861 to 1865. [MASK] led the nation through the American Civil War, the country's greatest [MASK], [MASK], and [MASK] crisis. ", \
             "George [MASK], who served as the first  president of the United States from [MASK] to 1797, was an American political leader, [MASK] [MASK], statesman, and Founding Father. Previously, he led Patriot forces to [MASK] in the nation's War for Independence. ", \
             "[MASK], the first African-American [MASK] of the [MASK] [MASK], is an American politician and attorney who served as the 44th [MASK] of the United States from [MASK] to 2017.  [MASK] was a member of the [MASK] [MASK]. "]
#input_txt = 
input_txt= [i.replace("[MASK]", tokenizer.mask_token) for i in input_txt] #
inputs = tokenizer(input_txt, return_tensors='pt', add_special_tokens=True, padding=True)
inputs['output_attentions'] = True
inputs['output_hidden_states'] = True
inputs['return_dict'] = True
outputs = model(**inputs)
if True:
  predictions = outputs.logits
  for pred in predictions:
    print ("**")
    sorted_preds, sorted_idx = pred.sort(dim=-1, descending=True)
    for k in range(2):
        predicted_index = [sorted_idx[i, k].item() for i in range(0,len(predictions[0]))]
        predicted_token = ' '.join([tokenizer.convert_ids_to_tokens([predicted_index[x]])[0] for x in range(1,len(predictions[0]))]).replace('Ġ', ' ').replace('  ', ' ').replace('##', '')
        print(predicted_token)

Expected behavior

I expected at least the input to be echoed out, with the slots filling with Lincoln, Washington and Obama. This works for bert, distlbert, roberta, etc.

Actual output

Some weights of the model checkpoint at squeezebert/squeezebert-mnli were not used when initializing SqueezeBertForMaskedLM: ['classifier.weight', 'classifier.bias']

LysandreJik commented 3 years ago

Hello! First of all, you're using the squeezebert-mnli checkpoint, which is a checkpoint that was fine-tuned on the MNLI dataset. It cannot be used to do masked language modeling.

I believe you should be using the squeezebert-uncased checkpoint instead.

However, even when using that checkpoint with the MLM pipeline I cannot obtain sensible results. Maybe @forresti can chime in and let us know if something's up!

huu4ontocord commented 3 years ago

Thanks @LysandreJik . I used both squeezebert-mnli and squeezebert-uncased (not shown). Same type of results. Thanks for checking. @forresti any thoughts? Is there something wrong with the squeezbert tokenizer?

forresti commented 3 years ago

@ontocord Sorry for the slow reply. I will dig into this on Thursday this week.

forresti commented 3 years ago

@ontocord Thanks so much for bringing this to my attention! I was able to reproduce the issue. And, I think I was able to fix the issue in PR #8479.

Now, let's try running your example code with...

... this produces the following output:

Some weights of the model checkpoint at squeezebert/squeezebert-uncased were not used when initializing SqueezeBertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing SqueezeBertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing SqueezeBertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of SqueezeBertForMaskedLM were not initialized from the model checkpoint at squeezebert/squeezebert-uncased and are newly initialized: ['transformer.embeddings.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
**
he was an american politician and lawyer who served as the 16th president of the united states from 1861 to 1865 . he led the nation through the american civil war , the country ' s greatest war , war , and economic crisis . , war , economic economic war
johnson is a americans statesman & attorney and serve the interim 17th presidency in of confederate state in 1860 until 1866 " white lead a country throughout a america black wars and a nation ’ largest largest economic and famine and , political crises " and famine war and political crisis
**
george washington , who served as the first president of the united states from 1796 to 1797 , was an american political leader , patriot patriot , statesman , and founding father . previously , he led patriot forces to victory in the nation ' s war for independence . ,
james harrison s jr serve in inaugural inaugural presidency in s u united in 1789 until 1799 ) is a americans politician figure and military statesman and politician and , adoptive fathers " historically was his lead revolutionary troops in fight during a country ’ the fight of freedom " and
**
johnson , the first african - american president of the united states , is an american politician and attorney who served as the 44th president of the united states from 2016 to 2017 . he was a member of the republican party . , john the republican republican party . the
williams is , second black – americans governor in this colored senate islander was a americans political , lawyer , serves the a 43rd governor for of union state in 2015 until 2016 , she is an part the house democratic assembly " . james senate democratic democratic assembly party and

Alas, the model seems to think Obama's name is "Johnson," but it does get George Washington correct.

Anyway, does this output look a bit more like what you expected? :)

LysandreJik commented 3 years ago

Thsnks a lot @forresti! This works as well with the fill-mask pipeline:

>>> from transformers import AutoModelForMaskedLM, AutoTokenizer

>>> model = AutoModelForMaskedLM.from_pretrained('squeezebert/squeezebert-uncased')
>>> tokenizer = AutoTokenizer.from_pretrained('squeezebert/squeezebert-uncased')
>>> input_txt = [
...     "George Washington, who served as the first [MASK] of the United States from 1789 to 1797, was an American political leader."
... ]

>>> from transformers import pipeline
>>> nlp = pipeline("fill-mask", model=model, tokenizer=tokenizer)
>>> print(nlp(input_txt))
[{'sequence': '[CLS] george washington, who served as the first president of the united states from 1789 to 1797, was an american political leader. [SEP]', 'score': 0.9644643664360046, 'token': 2343, 'token_str': 'president'}, {'sequence': '[CLS] george washington, who served as the first governor of the united states from 1789 to 1797, was an american political leader. [SEP]', 'score': 0.026940250769257545, 'token': 3099, 'token_str': 'governor'}, {'sequence': '[CLS] george washington, who served as the first king of the united states from 1789 to 1797, was an american political leader. [SEP]', 'score': 0.0013772461097687483, 'token': 2332, 'token_str': 'king'}, {'sequence': '[CLS] george washington, who served as the first lieutenant of the united states from 1789 to 1797, was an american political leader. [SEP]', 'score': 0.0012003666488453746, 'token': 3812, 'token_str': 'lieutenant'}, {'sequence': '[CLS] george washington, who served as the first secretary of the united states from 1789 to 1797, was an american political leader. [SEP]', 'score': 0.0008091009221971035, 'token': 3187, 'token_str': 'secretary'}]
huu4ontocord commented 3 years ago

Thank @forresti! Yes this fixes the problem! Thank you @LysandreJik as well! I noticed that different models have different capacities to store facts. Roughly based on the number of parameters, but not always. As a question, do you know of any models that are trained to identify a relationship and not a word in the mask:, leader($X, president,united_states,1789,1797) served as the first president of the united states from 1789 to 1797 ... in theory this should reduce the number of facts the model needs to learn as the relationships are already being learned by the attention mechanism, I belive.