huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.97k stars 26.29k forks source link

IDEFICS-9B output NAN with transformer 4.33.2 #32426

Open ForJadeForest opened 1 month ago

ForJadeForest commented 1 month ago

System Info

Who can help?

@amyeroberts @SunMarc

Information

Tasks

Reproduction

I run the official code in my machine, but get a invalid ouput.

import torch
from transformers import IdeficsForVisionText2Text, AutoProcessor

device = "cuda" if torch.cuda.is_available() else "cpu"

checkpoint = "HuggingFaceM4/idefics-9b"
model = IdeficsForVisionText2Text.from_pretrained(checkpoint, torch_dtype=torch.bfloat16).to(device)
processor = AutoProcessor.from_pretrained(checkpoint)

# We feed to the model an arbitrary sequence of text strings and images. Images can be either URLs or PIL Images.
prompts = [
    [
        "https://upload.wikimedia.org/wikipedia/commons/8/86/Id%C3%A9fix.JPG",
        "In this picture from Asterix and Obelix, we can see"
    ],
]

# --batched mode
inputs = processor(prompts, return_tensors="pt").to(device)
# --single sample mode
# inputs = processor(prompts[0], return_tensors="pt").to(device)

# Generation args
bad_words_ids = processor.tokenizer(["<image>", "<fake_token_around_image>"], add_special_tokens=False).input_ids

generated_ids = model.generate(**inputs, bad_words_ids=bad_words_ids, max_length=100)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)
for i, t in enumerate(generated_text):
    print(f"{i}:\n{t}\n")

output is below:

(['In this picture from Asterix and Obelix, we can see'],
 tensor([[    1, 32000, 32001, 32000,   512,   445,  7623,   515,   319,  2475,
            861,   322,  4250,   295,   861, 29892,   591,   508,  1074,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0]],
        device='cuda:6'))

I try to do forward and find the logits are all NAN.

image

But, when I upgrade my transformers package to 4.38.2 (Only upgrade the transformers). The result will be correct. image

Expected behavior

Different version of Transformers should generate the same output.

amyeroberts commented 1 month ago

Hi @ForJadeForest,

This is the desired behaviour: we don't want to experience nans in our outputs and the latest version doesn't have this. It's true that we generally want consistency of behaviour across versions, however in this case it appears there's a fix which we want

ForJadeForest commented 1 month ago

Hi @ForJadeForest,

This is the desired behaviour: we don't want to experience nans in our outputs and the latest version doesn't have this. It's true that we generally want consistency of behaviour across versions, however in this case it appears there's a fix which we want

Yeah, but I try to my old conda env with 4.33.2 transformers and I get the correct output. The old env's sys info is

- `transformers` version: 4.33.2
- Platform: Linux-4.15.0-76-generic-x86_64-with-glibc2.27
- Python version: 3.10.14
- Huggingface_hub version: 0.24.3
- Safetensors version: 0.4.3
- Accelerate version: 0.33.0
- Accelerate config:    not found
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

The correct output with 4.33.2 image

The error output with 4.33.2 image

The output with 4.38.2 image

Btw, I find the first 4.33.2 output is different with 4.38.2 (I set do_sample to False). This phenomenon is very strange, it might affect my performance testing.

amyeroberts commented 1 month ago

@ForJadeForest I'm not sure I understand the issue being reported. In the issue description, you said:

transformers version: 4.33.2 I run the official code in my machine, but get a invalid ouput. I try to do forward and find the logits are all NAN. But, when I upgrade my transformers package to 4.38.2 (Only upgrade the transformers). The result will be correct.

And then in the reply you said

Yeah, but I try to my old conda env with 4.33.2 transformers and I get the correct output.

So do you get the expected output with 4.33.2 or not?

Why are you comparing specifically with 4.38.2? Could you compare with the output on the most stable release -- 4.43.2? If there's a bug we can add commits which fix things for future releases - but we can't change the behaviour of older released versions

ForJadeForest commented 1 month ago

Sorry for my mistake. I meant there are three environments:

  1. The first environment (env_1) has transformers version 4.33.2 (see figure 1).
  2. The second environment (env_2) also has transformers version 4.33.2 (see figure 2). Env_2 has different dependency compared with env_1.
  3. The third environment (env_3) is an upgraded version of env_2 with transformers version 4.38.2 (see figure 3).

The other dependency packages are not identical between env_1 and env_2. However, env_2 and env_3 have the same dependency packages.

I have two questions:

  1. I tested the result of transformers version 4.38.2 (env_3). However, I don't understand why transformers version 4.33.2 (env_2) fails. Maybe it can fix other dependency package version to solve. Besides, I upgrade Transformers of env_3 to 4.43.2, which output the same with 4.38.2
  2. I found that using env_1 and env_3 produces different outputs when do_sample = False. This is problematic because I am trying to reproduce a paper, and the performance on all task (Image Captioning, VQA) of env_3 is lower compared to env_1. (Compared with figure1, the output of figure 3 are simple and less information. )
github-actions[bot] commented 11 hours ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.