Cannot replicate results in live demo app

ravisurdhar commented 4 years ago

Hi, I'm trying to replicate the core functionality of your live demo app in a Jupyter notebook that strips out all of the Streamlit code, but I'm having trouble replicating the results.

Example input: 'Who are you voting for in 2020?' Actual output: (['2020', 'elections', 'foreign policy', 'business', 'Europe', 'politics', 'outdoor recreation'], [0.021523168310523033, 0.021523168310523033, 0.021523168310523033, 0.021523164585232735, 0.021523164585232735, 0.021523121744394302, 0.021523121744394302])

As you can see, the probabilities for each label are virtually identical and all extremely low, while the live demo has the probabilities for the first three labels above 95% and the others at 0.4%. It seems like somehow the model is being loaded in an untrained state, and I'm getting the following error when I try to load either the facebook/bart-large-mnli model or the joeddav/bart-large-mnli-yahoo-answers model:

Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartForSequenceClassification: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing BartForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

However, some googling lead me to this issue, which makes it sound like this error is expected. Not sure why I'm getting the results I'm getting though. Any help would be greatly appreciated!

Code

``` from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline model_ids = {'Bart MNLI': 'facebook/bart-large-mnli'} device = -1 def load_models(): return {id: AutoModelForSequenceClassification.from_pretrained(id) for id in model_ids.values()} models = load_models() def load_tokenizer(tok_id): return AutoTokenizer.from_pretrained(tok_id) hypothesis_template = 'This text is about {{}}.' def get_most_likely(nli_model_id, sequence, labels, hypothesis_template, multi_class=True): classifier = pipeline('zero-shot-classification', model=models[nli_model_id], tokenizer=load_tokenizer(nli_model_id), device=device) outputs = classifier(sequence, labels, hypothesis_template, multi_class) return outputs['labels'], outputs['scores'] test_seq = 'Who are you voting for in 2020?' test_labels = [x.strip() for x in 'foreign policy, Europe, elections, business, 2020, outdoor recreation, politics'.strip().split(',')] get_most_likely('facebook/bart-large-mnli', test_seq, test_labels, hypothesis_template) ```

I'm using Python 3.6, torch 1.6.0 (in non-CUDA mode), and transformers 3.1.0 on OS X 10.15.6.

joeddav commented 4 years ago

Yeah, that warning isn't the issue. Those outputs are definitely wrong but I'm not sure why without code. You should just be able to do this:

from transformers import pipeline
classifier = pipeline("zero-shot-classification")
sequence = "Who are you voting for in 2020?"
candidate_labels = ['2020', 'elections', 'foreign policy', 'business', 'Europe', 'politics', 'outdoor recreation']
hypothesis_template = "This text is about {}."

classifier(sequence, candidate_labels, hypothesis_template=hypothesis_template, multi_class=True)

Check out this notebook for more examples.

ravisurdhar commented 4 years ago

Weird, that code does run fine for me and I get the expected result. I'm not quite sure how that's significantly different to the code that I have in my original post (it's in the collapsible section labeled code above the last sentence). The only difference seems to be that I included these lines, which I basically copy/pasted from your demo app:

model_ids = {'Bart MNLI': 'facebook/bart-large-mnli'}
def load_models():
    return {id: AutoModelForSequenceClassification.from_pretrained(id) for id in model_ids.values()}
models = load_models()

joeddav commented 4 years ago

My only other thought is to make sure you're using the right tokenizer. If you pass an instantiated model object (rather than a string model identifier) to the pipeline factory, it can't infer which tokenizer to use so you have to pass tokenizer=tokenizer in addition to the model. Otherwise I'll need the whole of your code to spot the issue.

ravisurdhar commented 4 years ago

This is my entire code:

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
model_ids = {'Bart MNLI': 'facebook/bart-large-mnli'}

device = -1

def load_models():
    return {id: AutoModelForSequenceClassification.from_pretrained(id) for id in model_ids.values()}

models = load_models()

def load_tokenizer(tok_id):
    return AutoTokenizer.from_pretrained(tok_id)

hypothesis_template = 'This text is about {{}}.'

def get_most_likely(nli_model_id, sequence, labels, hypothesis_template, multi_class=True):
    classifier = pipeline('zero-shot-classification',
                          model=models[nli_model_id],
                          tokenizer=load_tokenizer(nli_model_id),
                          device=device)
    outputs = classifier(sequence, labels, hypothesis_template, multi_class)
    return outputs['labels'], outputs['scores']

test_seq = 'Who are you voting for in 2020?'
test_labels = [x.strip() for x in 'foreign policy, Europe, elections, business, 2020, outdoor recreation, politics'.strip().split(',')]

get_most_likely('facebook/bart-large-mnli', test_seq, test_labels, hypothesis_template)

And the output is:

(['2020',
  'elections',
  'foreign policy',
  'business',
  'Europe',
  'politics',
  'outdoor recreation'],
 [0.021523168310523033,
  0.021523168310523033,
  0.021523168310523033,
  0.021523164585232735,
  0.021523164585232735,
  0.021523121744394302,
  0.021523121744394302])

I seemed to have found the issue, though I'm having a hard time understanding why this is the culprit: in my code, I have hypothesis_template = 'This text is about {{}}.', which I copied from line 34 in your code. However, in your first comment in this issue, you have hypothesis_template = "This text is about {}.", with a single set of brackets. If I change my code above to have a single set of brackets, I get the expected results. Any idea why I'm seeing this behavior? I wouldn't expect the hypothesis_template formatting to affect the results of the classifier...

joeddav commented 4 years ago

Looks like you copied some unrendered markdown. {{}} should just be {}. It’s only double so it’s escaped.

ravisurdhar commented 4 years ago

I read through the source code for the ZeroShotClassificationPipeline and now it finally makes sense! My misunderstanding was that the hypothesis template was an output, ie the model would return a string with the format you specified and insert the label with the highest probability in the {}. But now I understand that it's actually an input, meaning the classifier will return the probability that the sentence you supply is true given each label in the list of labels.

Thanks for helping me out! I'll go ahead and close this issue.

joeddav / zero-shot-demo

Cannot replicate results in live demo app #1