Generative models return the same responses to all questions

serenalotreck commented 1 year ago

System Info

transformers version: 4.25.1
Platform: Linux-3.10.0-1160.80.1.el7.x86_64-x86_64-with-centos-7.9.2009-Core
Python version: 3.7.16
Huggingface_hub version: 0.13.3
PyTorch version (GPU?): 1.13.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help?

@ArthurZucker and @younesbelkada

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Problem: Different questions to a conversational pipeline result in the same answers for a given model. This problem occurs across multiple models, and occurs when a new Python session is initiated between runs.

Code to reproduce:

from transformers import pipeline, Conversation

for model in ['facebook/opt-1.3b', 'bigscience/bloom-560m', 'gpt2']:
     generator = pipeline(task='conversational', model=model)
     convo = Conversation('Should I see a movie tonight?')
     generator(convo)

for model in ['facebook/opt-1.3b', 'bigscience/bloom-560m', 'gpt2']:
     generator = pipeline(task='conversational', model=model)
     convo = Conversation('What do you know about biology?')
     generator(convo)

Outputs:

From the first for loop:

Conversation id: 9335b8bb-d73e-4fb0-91e3-bb0dbf62dd76 
user >> Should I go see a movie tonight? 
bot >> I'm not sure if this is a good idea. 
Conversation id: 03c41e56-35b1-4b02-9757-4bf1c90a6f32 
user >> Should I go see a movie tonight? 
bot >> The first thing you need to do is to get a
The attention mask and the pad token id were not set. As a consequence, you may observe unexpecte
d behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation
results, please set `padding_side='left'` when initializing the tokenizer.
Conversation id: 62e5f6f0-7e6e-4a5c-baf6-93ea40e31b85 
user >> Should I see a movie tonight? 
bot >> The first time I saw the new Star Wars movie, I

From the second for loop:

Conversation id: f14a10d8-3661-482e-8b95-bb0a417a0afd 
user >> What do you know about biology? 
bot >> I'm not sure if this is a good idea.  
Conversation id: 24866d8e-bfc8-4ebf-825e-b90965ab60b7 
user >> What do you know about biology? 
bot >> The first thing you need to do is to get a good 
The attention mask and the pad token id were not set. As a consequence, you may observe unexpecte
d behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation
results, please set `padding_side='left'` when initializing the tokenizer.
Conversation id: 40d35c22-cf89-4750-931e-f75a5d80431b 
user >> What do you know about biology? 
bot >> The first time I saw the new Star Wars movie, I

Expected behavior

Sensical answers that are in response to the question, rather than an out-of-the-box response that doesn't make sense in context.

amyeroberts commented 1 year ago

cc @Narsil @gante

gante commented 1 year ago

Hey @serenalotreck 👋

The models you're trying to use are not compatible with the conversational pipeline. That's why you see the same output on a given model, regardless of the input.

Check these docs: "The models that this pipeline can use are models that have been fine-tuned on a multi-turn conversational task, currently: ‘microsoft/DialoGPT-small’, ‘microsoft/DialoGPT-medium’, ‘microsoft/DialoGPT-large’. See the up-to-date list of available models on huggingface.co/models."

P.S.: you might be able to get conversational-like behavior from a standard text generation pipeline, using models like open assistant, but we don't have step-by-step docs for that at the moment. Check the model card for high-level instructions.

serenalotreck commented 1 year ago

@gante that makes sense, thank you!

I'm currently looking for open source alternatives to GPT-3.5 that I can use with an API for relation extraction through a series of prompts (e.g. "Rewrite this sentence into multiple sentences, each containing only one relation", or "Extract an SPO triple from the following sentence").

Do you happen to know if models other than Open Assistant can be used in the same manner? The models in the list in the code example above are all from the search results for Text Generation models, and claim to be open source alternatives to research LLMs, but even using text-generation type pipelines, I haven't been able to get responses that mimic what ChatGPT can do, even using GPT-2 (for example, in the Rewrite the Sentence prompt, it just adds to the sentence instead of rewriting), so I suspect I may just be doing something wrong with how I'm building my pipelines. I'll give Open Assistant a shot in the meantime!

Any thoughts are appreciated, thanks!

gante commented 1 year ago

@serenalotreck you can check this leaderboard to see the highest scoring open-source LLMs.

The catch is that they need a carefully crafted input prompt (also known as system prompt) before they turn into helpful assistants like ChatGPT. ChatGPT also has it, but it is hidden to you. Here's a simple example, for the case of open assistant -- you may be able to find more online :)

As per our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other matters, we'd like to invite you to use our forum 🤗

Narsil commented 1 year ago

And even more than the system prompt, there is usually a specific token sequence used during the model finetuning, which is critical to get a good output.

For instance OpenAssistant biggest model is using "<|prompt_begin|><|prompter|>somethign something<|assistant|>". And different models use different prompting. Unfortunately at this time there are too many different models released at the same time, and it's impossible to include all of these specific parts everywhere.

https://huggingface.co/chat/ should give you an idea of what OpenAssistant model is capable of. OpenAssistant has their own front to their models https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjCpZe2iPz-AhX6hv0HHdo1CsQQjBB6BAgdEAE&url=https%3A%2F%2Fopen-assistant.io%2Fchat&usg=AOvVaw2BLJ_sUF4zgiHZMHNcFVnd

serenalotreck commented 1 year ago

Thank you all so much, that's super helpful!!

gante commented 1 year ago

@serenalotreck this link might also be relevant to you: https://github.com/oobabooga/text-generation-webui/tree/main/characters/instruction-following

It contains the templates to manipulate specific models

huggingface / transformers