GPTNeoXForCausalLM examples fail to run

srelbo commented 2 years ago

System Info

- `transformers` version: 4.20.0.dev0
- Platform: Linux-5.13.0-44-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- Huggingface_hub version: 0.5.1
- PyTorch version (GPU?): 1.10.0+cu113 (True)
- Tensorflow version (GPU?): 2.6.0 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No

Who can help?

@patil-suraj

I have been trying to using Huggingface GPTNeoX models to generate text. However even the basic example use case fails both on CPU and GPU.

from transformers import GPTNeoXTokenizerFast, GPTNeoXForCausalLM, GPTNeoXConfig

if __name__ == "__main__":
    tokenizer = GPTNeoXTokenizerFast.from_pretrained("EleutherAI/gpt-neox-20b")
    config = GPTNeoXConfig.from_pretrained("EleutherAI/gpt-neox-20b")
    inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
    model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b", config=config)
    outputs = model.generate(
        inputs.input_ids,
        do_sample=True,
        temperature=0.9,
        max_length=100,
    )
    gen_text = tokenizer.batch_decode(outputs)[0]
    print(gen_text)

Fails with

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Traceback (most recent call last):
  File "/home/joy/.venv/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/joy/.venv/lib/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate
    return self.sample(
  File "/home/joy/.venv/lib/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample
    outputs = self(
  File "/home/joy/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/joy/.venv/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 590, in forward
    outputs = self.gpt_neox(
  File "/home/joy/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/joy/.venv/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 482, in forward
    outputs = layer(
  File "/home/joy/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/joy/.venv/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 290, in forward
    attention_layer_outputs = self.attention(
  File "/home/joy/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/joy/.venv/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 148, in forward
    attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
  File "/home/joy/.venv/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 211, in _attn
    attn_output = torch.matmul(attn_weights, value)
RuntimeError: Expected batch2_sizes[0] == bs && batch2_sizes[1] == contraction_size to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

Similar failure happen when running on CUDA as well.

Based on -- https://huggingface.co/docs/transformers/main/en/model_doc/gpt_neox

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Run the example script in https://huggingface.co/docs/transformers/main/en/model_doc/gpt_neox
See failure both on CPU and GPU

Expected behavior

The model should generate text.

On CUDA I have tried with `remove_invalid_values=True` but then the model produces garbage.

srelbo commented 2 years ago

@zphang Would you be able to help us get this working? Thanks 🙏

benkrause commented 2 years ago

I am using my own custom generate function and didn't have the invalid values problem, but I was also having issues with nonsensical generations that might possibly be related. I found that this was due to a mistake with caching the previous attention states on line 146 of modeling_gpt_neox.py

I think present = None if use_cache else (key, value) should be present = (key,value) if use_cache else None

Changing that fixed my issue. You may also be able to set use_cache=False as an argument to model.generate, although that will give slightly slower generation.

zphang commented 2 years ago

Hi, sorry I've been busy. I should be able to take a look at this this weekend.

srelbo commented 2 years ago

Thank you @benkrause and @zphang !

muhammad-ahmed-ghani commented 2 years ago

Hi @benkrause @zphang I have same issue. Can you please help me to resolve it ?

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Traceback (most recent call last):
  File "app.py", line 15, in <module>
    use_cache=False
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/generation_utils.py", line 1330, in generate
    **model_kwargs,
  File "/opt/conda/lib/python3.7/site-packages/transformers/generation_utils.py", line 1975, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

binglun30 commented 1 year ago

Hi @benkrause @zphang I have same issue. Can you please help me to resolve it ?

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Traceback (most recent call last):
  File "app.py", line 15, in <module>
    use_cache=False
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/generation_utils.py", line 1330, in generate
    **model_kwargs,
  File "/opt/conda/lib/python3.7/site-packages/transformers/generation_utils.py", line 1975, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

I'm also experiencing this issue. Have you already solved it?

huggingface / transformers