huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.63k stars 26.7k forks source link

[Blenderbot] Model yields weird results #9457

Closed patrickvonplaten closed 3 years ago

patrickvonplaten commented 3 years ago

As discussed with @Narsil offline, Blenderbot seems to yield weird generation results. I think we have to dive deeper into the original Parlai lib and make sure that there is no flaw in the model or generate function.

Also on my Todo list.

Also pinging @patil-suraj and @Narsil for notice.

patil-suraj commented 3 years ago

Yes, I'm actually investigating this, also see #9365

spark-ming commented 3 years ago

Any new insights into this issue?

Narsil commented 3 years ago

Yes, most of the work was done here:

https://github.com/huggingface/transformers/pull/10002 and https://github.com/huggingface/transformers/pull/9984

It was mostly linked to something which was not supported by the generate function (namely encoder_no_repeat_n_gram_size) at the time.

I've seen a few issues creep up again about blenderbot (namely questioning the separation scheme of conversation items). I didn't have time to dive more into it again to double check, but at the time of the mentionned PRs, the separation scheme was tripled checked against the master branch of ParlAI (the questionning was mentionning the docs, which could always be outdated).

Also keep in mind, ParlAI actually uses more scheme to prevent the model from outputting too many odd stuff. There's an hardcoded banned word list + an actual model to detect anything inappropriate (maybe more, what I found was way out of scope for transformers and also extremely specific to Blenderbot). The "personna" thing, are usable within transformers, but do rely on tricks. A "personna" is actually just a prompt at the start of the conversation looking like "your personna: You live in a mansion". So prefixing your conversation with "your persona: You live in a mansion Hi there!" should yield the same results as Blenderbot. Check ParlAI implementations to confirm (I'm not sure about the actual casing used and so on).

spark-ming commented 3 years ago

Thanks for the reply @Narsil as well as the links to the related PRs. Yes, I'm aware of ParlAI's implementation of a safety detector. Thanks also for the point about the persona implementation - that is what I assumed but it's great that you've confirmed.

Just to check, is the separation scheme a total of three spaces between turns? (2 in the join operator plus an extra at the start of each sentence) This is what I see in tests/test_pipelines_conversational.py

If so, the documentation may be outdated, as it uses </s> <s> between turns, which produces different results.

Narsil commented 3 years ago

Yes, I confirmed that it was 3 spaces. It's supposed to be 4 spaces, but if I remember correctly, it was actually 2 + 1 hardcoded. I checked at the token level in the end, and it's 228, 228 all the time.

Found the persona code, the sentence split was a bit more spread out, I can't find it right away,

it's somewhere in there https://github.com/facebookresearch/ParlAI/blob/master/parlai/core/torch_generator_agent.py if you want to start inspecting live code.

spark-ming commented 3 years ago

Yes, I confirmed that it was 3 spaces. It's supposed to be 4 spaces, but if I remember correctly, it was actually 2 + 1 hardcoded. I checked at the token level in the end, and it's 228, 228 all the time.

Found the persona code, the sentence split was a bit more spread out, I can't find it right away,

it's somewhere in there https://github.com/facebookresearch/ParlAI/blob/master/parlai/core/torch_generator_agent.py if you want to start inspecting live code.

Perfect, thanks for the reference. I just managed to do some poking around in the ParlAI library and confirmed the delimiter token in the history object. It is also what you found.

from parlai.core.agents import create_agent_from_model_file
blender_agent = create_agent_from_model_file("zoo:blender/blender_400Mdistill/model", {"skip_generation": False})

print(blender_agent.history.delimiter_tok)

# Output: [228, 228]

For persona, looks like they just separate all the persona details with newlines, and bundle it into the first turn. E.g.

your persona: I like cheese\nyour persona: I am from New York City[228, 228]Hi, where are you from[228, 228]Hi, I'm from the city of new york city. How about you? Do you like cheese?[228,228]do you like cheese?[228, 228]Yes, I love cheese. It is one of my favorite foods. What is your favorite food?

Reference: https://github.com/facebookresearch/ParlAI/issues/2872

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.