huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.93k stars 26.79k forks source link

DataCollatorWithFlattening is incompatible with non - list input ids #33946

Open alex-hh opened 2 weeks ago

alex-hh commented 2 weeks ago

System Info

latest transformers

Who can help?

@ArthurZucker

Information

Tasks

Reproduction

from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
example = tokenizer("A test sentence", return_tensors="pt")
example = {k: v.flatten() for k, v in tensor_example.items()}
collator([example]*2)

Expected behavior

Collator should work with all output types supported by tokenizer.

gaurangk19 commented 2 weeks ago

Hi! I am planning on working on this under Hacktoberfest 2024. Can you assign me this issue? I hope I am able to solve this

ArthurZucker commented 1 week ago

Hey! We do not assign issues, feel free to open a PR 🤗