[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)
Reproduction
I'm using a generate function on inputs that I put on GPU. I'm using a nllb model.
When everything works:
when using a string as an input on cpu
when using a string as an input on gpu
when using a batch as an input on cpu
When it breaks:
when using a batch as an input on gpu:
Example code: Translation from English to English
tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-1.3B",src_lang="eng_Latn")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-1.3B")
article ='This does not work'
#works
#inputs = tokenizer([article, article, article ,article, article], return_tensors="pt")
inputs = tokenizer.batch_encode_plus([article, article, article ,article, article], return_tensors="pt").
#does not work
#inputs = tokenizer([article, article, article ,article, article], return_tensors="pt").to("cuda")
#inputs = tokenizer.batch_encode_plus([article, article, article ,article, article], return_tensors="pt").to("cuda")
translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["eng_Latn"])
translated_text = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[:]
The error given is:
Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
File ..... in <module>
translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["eng_Latn"])
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
System Info
transformers
version: 4.34.1Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm using a generate function on inputs that I put on GPU. I'm using a nllb model.
When everything works:
When it breaks: when using a batch as an input on gpu:
Example code: Translation from English to English
Expected behavior
Inference on batched inputs that are on GPU.