RuntimeError: expand(torch.FloatTensor{ ... }, size = [...]) the number of sizes provided (4) must be greater or equal to the number of dimensions in the tensor (5)

lbertge commented 2 days ago

System Info

transformers version: 4.45.1
Platform: Linux-5.4.0-193-generic-x86_64-with-glibc2.17
Python version: 3.8.13
Huggingface_hub version: 0.25.1
Safetensors version: 0.4.5
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.4.1+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: no
Using GPU in script?: yes
GPU type: NVIDIA A100-PCIE-40GB

Who can help?

I think @gante was the only one who has touched transformers/models/gpt_neo/modeling_gpt_neo.py recently, would they be able to understand this issue or see if I have done something wrong?

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

import torch
from transformers import GPTNeoForCausalLM

model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M")

input_ids = torch.randint(0, 50256, (32, 1, 128)).long()
attention_mask = torch.ones((32, 1, 128)).long()
labels = torch.randint(0, 50256, (32, 1, 128)).long()

outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)

Expected behavior

Hello,

In transformers==4.44.2, the above code seems to work. In 4.45.1 this throws an error saying expand(torch.FloatTensor{[32, 32, 1, 1, 128]}, size=[32, 1, 1, 128]): the number of sizes provided (4) must be greater or equal to the number of dimensions in the tensor (5).

Thank you for your consideration!

LysandreJik commented 1 day ago

cc @ArthurZucker @gante

ArthurZucker commented 1 day ago

Hey! There is something wrong with the input ids no? The shape is 3d, I have no idea what that means to have input ids (not input embeddings) of shape (32, 1, 128).

lbertge commented 2 hours ago

hello @ArthurZucker!

I have a dataset which is composed of examples that I must individually tokenize. So for instance, some examples in my dataset look like

XXX = ?
YYY = ?

I tokenize each such example using the canonical line tokenizer(example, return_tensors="pt") which returns a dict of tensors, each with shape (1, <len of tokens>).

I subsequently create a torch.utils.data.DataLoader object around this tokenized dataset, so if for example I use a batch of size 32, then my input_ids shape will come out like (32, 1, x). Does that make sense?

Feel free to close since I can figure out a workaround, although I am curious why there was a breaking change from 4.44 to 4.45 for this particular shape. Thanks for your consideration!

huggingface / transformers