Open lbertge opened 2 days ago
cc @ArthurZucker @gante
Hey! There is something wrong with the input ids no? The shape is 3d, I have no idea what that means to have input ids (not input embeddings) of shape (32, 1, 128).
hello @ArthurZucker!
I have a dataset which is composed of examples that I must individually tokenize. So for instance, some examples in my dataset look like
I tokenize each such example using the canonical line tokenizer(example, return_tensors="pt")
which returns a dict of tensors, each with shape (1, <len of tokens>)
.
I subsequently create a torch.utils.data.DataLoader
object around this tokenized dataset, so if for example I use a batch of size 32, then my input_ids shape will come out like (32, 1, x)
. Does that make sense?
Feel free to close since I can figure out a workaround, although I am curious why there was a breaking change from 4.44 to 4.45 for this particular shape. Thanks for your consideration!
System Info
transformers
version: 4.45.1Who can help?
I think @gante was the only one who has touched
transformers/models/gpt_neo/modeling_gpt_neo.py
recently, would they be able to understand this issue or see if I have done something wrong?Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Hello,
In transformers==4.44.2, the above code seems to work. In 4.45.1 this throws an error saying
expand(torch.FloatTensor{[32, 32, 1, 1, 128]}, size=[32, 1, 1, 128]): the number of sizes provided (4) must be greater or equal to the number of dimensions in the tensor (5)
.Thank you for your consideration!