Open bw-wang19 opened 6 days ago
Hi ! datasets
uses Arrow as storage backend which is agnostic to deep learning frameworks like torch. If you want to get torch tensors back, you need to do dataset = dataset.with_format("torch")
Hi !
datasets
uses Arrow as storage backend which is agnostic to deep learning frameworks like torch. If you want to get torch tensors back, you need to dodataset = dataset.with_format("torch")
It does work! Thanks for your suggestion!
Describe the bug
I tried to add input_ids to dataset with map(), and I used the return_tensors='pt', but why I got the callback with the type of List?
Steps to reproduce the bug
Expected behavior
Sorry for this silly question, I'm noob on using this tool. But I think it should return a tensor value as I have used the protocol? When I tokenize only one sentence using tokenized_input=tokenizer(input, return_tensors='pt' ),it does return in tensor type. Why doesn't it work in map()?
Environment info
transformers>=4.41.2,<=4.45.0 datasets>=2.16.0,<=2.21.0 accelerate>=0.30.1,<=0.34.2 peft>=0.11.1,<=0.12.0 trl>=0.8.6,<=0.9.6 gradio>=4.0.0 pandas>=2.0.0 scipy einops sentencepiece tiktoken protobuf uvicorn pydantic fastapi sse-starlette matplotlib>=3.7.0 fire packaging pyyaml numpy<2.0.0