Generation issues with seq2seq LMs

abarbet commented 1 year ago

System Info

transformers version: 4.27.1
Platform: Linux-5.19.0-41-generic-x86_64-with-glibc2.35
Python version: 3.9.12
Huggingface_hub version: 0.13.2
PyTorch version (GPU?): 2.0.0+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Yes, parallel (accelerate auto-mapping)

Who can help?

@ArthurZucker @gante

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

This has most recently arisen in using trlX to do reinforcement learning on flan-T5. I wrote an issue on their own repo, but there seems to be no response, and it is somewhat more suited to be an issue in this repo since it has to do with transformers code at its core.

The main issue is that generate with a seq2seq model, namely flan-t5, sometimes generates the following error: RuntimeError: probability tensor contains either `inf`, `nan` or element < 0. This has been well documented in other issues like this one, but the behavior in that issue is more custom than calling generate in its standard configuration.

Here is a code example to reproduce:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

m = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-large", device_map="auto")
t = AutoTokenizer.from_pretrained("google/flan-t5-large")

in_text = """You are a highly intelligent and accurate HVAC domain Resource Description Framework (RDF) data model. You take Passage as input and convert it into HVAC domain RDF triples. A triple is a set of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions.
Your output format is only [[ subject, predicate, object ], ...] nothing else

Examples: 
Input: The HV123 heating unit can supply 50W of power
Output: [[HV123, powerSupply, 50W]]

Input: Unit: ft. (m)
Model | Cooling Mode | Heating Mode
ABC123 | 28.8 (8.8) | 19.0 (5.8)
ABC456 | 28.8 (8.8) | 19.0 (5.8)
ABC789 | 28.8 (8.8) | 21.3 (6.5)
ABC987 | 29.0 (8.9) | 22.9 (7.0)
Output:"""

ins = t(in_text, return_tensors="pt").input_ids.to("cuda")
outs = m.generate(ins, do_sample=True, max_length=512, top_k=0, temperature=0.7, num_beams=2)

NB: temperature seems to be one of the main causes of this issue, as removing this kwarg from the generate call does not produce the error in the above case. However, that is not true of all cases. I have seen the error in my trlX training loops with kwargs as simple as: {"max_new_tokens": 512, "do_sample": True, "top_k": 0, "top_p": 1}. Thus it seems this error is not always related to temperature.

Expected behavior

The expected behavior in this case would be for the sampling to work every time instead of having strange edge cases where tokens are unreachable.

gante commented 1 year ago

Hey @abarbet 👋

This issue may arise when beam search, sampling, and long outputs are used together. A potential bug on PyTorch itself compounds it. You can read the full story in this issue.

TL;DR -- my immediate suggestion would be to avoid using num_beams and do_sample together. If you want to use them both, you'll have to read the issue linked above, which describes the problem and solutions :)

abarbet commented 1 year ago

Ah thank you, that issue is very helpful! Do you have any idea why we would see a similar error in trlX training despite not using beam sampling? I know you don't have access to my training script and also are most likely not familiar with their codebase, so this is a complete longshot.

The only thing I can think of if it's not caused by a sampling bug is some kind of destructive learning in the PPO step that causes token distributions to get completely out of whack.

gante commented 1 year ago

@abarbet It may be due to this PyTorch issue, where the sampling step may pick very low probability tokens that it shouldn't and, in turn, cause computations to derail.

Try running your script with PT 1.x instead of 2.0!

Daryl149 commented 1 year ago

@abarbet It may be due to this PyTorch issue, where the sampling step may pick very low probability tokens that it shouldn't and, in turn, cause computations to derail.

Try running your script with PT 1.x instead of 2.0!

For me, this issue also occurs with pytorch 1.13.1 https://github.com/huggingface/transformers/issues/22914#issuecomment-1562034753

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yungsinatra0 commented 1 year ago

Hello, has a fix been found for this issue? Using the latest version of transformers and can confirm that when running inference using model.generate() with parameters such as temperature and do_sample causes this issue.

  summary_ids = model.generate(
      inputs["input_ids"],
      max_length=max_length,
      min_length=128,
      temperature=0.1,
      do_sample=True,
      # top_p=0.3
      )

edit: can confirm now that do_sample and temperature is the cause of the issue as top_p works fine for me edit2: I forgot to mention that the model that I'm using is BRIO, loading pre-trained weights from HF

gante commented 1 year ago

@yungsinatra0 The issue should only be gone with the next PT release (i.e. torch>2.0)

huggingface / transformers