huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.45k stars 27.11k forks source link

model.generate(**inputs) breaks when inputs are batched on GPU #27103

Closed vitalyshalumov closed 1 year ago

vitalyshalumov commented 1 year ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

I'm using a generate function on inputs that I put on GPU. I'm using a nllb model.

When everything works:

  1. when using a string as an input on cpu
  2. when using a string as an input on gpu
  3. when using a batch as an input on cpu

When it breaks: when using a batch as an input on gpu:

Example code: Translation from English to English

tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-1.3B",src_lang="eng_Latn")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-1.3B")
article ='This does not work' 

#works
#inputs = tokenizer([article, article, article ,article, article], return_tensors="pt")
inputs = tokenizer.batch_encode_plus([article, article, article ,article, article], return_tensors="pt").

#does not work
#inputs = tokenizer([article, article, article ,article, article], return_tensors="pt").to("cuda")
#inputs = tokenizer.batch_encode_plus([article, article, article ,article, article], return_tensors="pt").to("cuda")

translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["eng_Latn"])
translated_text = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[:]

The error given is:
Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
  File ..... in <module>
    translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["eng_Latn"])
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

Expected behavior

Inference on batched inputs that are on GPU.

ArthurZucker commented 1 year ago

Doesn't seems like the model was put on the device when you did inputs.to("cuda") !Did you try setting model.to("cuda") as well?

vitalyshalumov commented 1 year ago

model.to('cuda') resolves the issue. Thanks!