Closed liyier90 closed 1 year ago
cc @younesbelkada
Hi @liyier90
Thanks! Sounds like the _no_split_modules
was not properly checked , I think the fix should be to replace the current _no_split_modules
with the ones you have defined.
Is this block :
# Demonstrate that only "model.encoder.layer_norm" and "model.encoder.embed_tokens"
# needs to be on the same device as the input
for module, device in device_map.items():
if module in {"model.encoder.layer_norm", "model.encoder.embed_tokens"}:
if device != 0:
device_map[module] = 0
else:
if device == 0:
device_map[module] = 1
necessary? I think accelerate
automatically takes care of setting the input to the correct device through hooks.
What happens if you remove it in your case and just use the correct _no_split_modules
?
If I comment out that block, I get the following error:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Traceback (most recent call last) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ <path>/code/nscc_working/engr/multi_node/nllb_inference/correct_infer.py:66 โ
โ in <module> โ
โ โ
โ 63 โ
โ 64 โ
โ 65 if __name__ == "__main__": โ
โ โฑ 66 โ main() โ
โ 67 โ
โ โ
โ <path>/code/nscc_working/engr/multi_node/nllb_inference/correct_infer.py:58 โ
โ in main โ
โ โ
โ 55 โ โ if torch.is_tensor(inputs[i]): โ
โ 56 โ โ โ inputs[i] = inputs[i].to("cuda:0") โ
โ 57 โ โ
โ โฑ 58 โ translated_tokens = model.generate( โ
โ 59 โ โ **inputs, forced_bos_token_id=tokenizer.lang_code_to_id["fra_Latn"] โ
โ 60 โ ) โ
โ 61 โ outputs = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True) โ
โ โ
โ <path>/.conda/envs/megatron/lib/python3.8/site-packages/torch/utils/_contextl โ
โ ib.py:115 in decorate_context โ
โ โ
โ 112 โ @functools.wraps(func) โ
โ 113 โ def decorate_context(*args, **kwargs): โ
โ 114 โ โ with ctx_factory(): โ
โ โฑ 115 โ โ โ return func(*args, **kwargs) โ
โ 116 โ โ
โ 117 โ return decorate_context โ
โ 118 โ
โ โ
โ <path>/.conda/envs/megatron/lib/python3.8/site-packages/transformers/generati โ
โ on/utils.py:1437 in generate โ
โ โ
โ 1434 โ โ โ โ ) โ
โ 1435 โ โ โ โ
โ 1436 โ โ โ # 11. run greedy search โ
โ โฑ 1437 โ โ โ return self.greedy_search( โ
โ 1438 โ โ โ โ input_ids, โ
โ 1439 โ โ โ โ logits_processor=logits_processor, โ
โ 1440 โ โ โ โ stopping_criteria=stopping_criteria, โ
โ โ
โ <path>/.conda/envs/megatron/lib/python3.8/site-packages/transformers/generati โ
โ on/utils.py:2288 in greedy_search โ
โ โ
โ 2285 โ โ โ if eos_token_id is not None: โ
โ 2286 โ โ โ โ if pad_token_id is None: โ
โ 2287 โ โ โ โ โ raise ValueError("If `eos_token_id` is defined, make sure that ` โ
โ โฑ 2288 โ โ โ โ next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 โ
โ 2289 โ โ โ โ
โ 2290 โ โ โ # update generated ids, model inputs, and length for next step โ
โ 2291 โ โ โ input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1) โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
RuntimeError: Expected all tensors to be on the same device, but found at least two devices,
cuda:1 and cuda:0!
Because model.encoder.layer_norm
got put on device 1:
{'lm_head': 0,
'model.decoder.embed_positions': 1,
'model.decoder.embed_tokens': 1,
'model.decoder.layer_norm': 2,
'model.decoder.layers.0': 1,
'model.decoder.layers.1': 1,
'model.decoder.layers.10': 2,
'model.decoder.layers.11': 2,
'model.decoder.layers.12': 2,
'model.decoder.layers.13': 2,
'model.decoder.layers.14': 2,
'model.decoder.layers.15': 2,
'model.decoder.layers.16': 2,
'model.decoder.layers.17': 2,
'model.decoder.layers.18': 2,
'model.decoder.layers.19': 2,
'model.decoder.layers.2': 1,
'model.decoder.layers.20': 2,
'model.decoder.layers.21': 2,
'model.decoder.layers.22': 2,
'model.decoder.layers.23': 2,
'model.decoder.layers.3': 1,
'model.decoder.layers.4': 1,
'model.decoder.layers.5': 1,
'model.decoder.layers.6': 1,
'model.decoder.layers.7': 2,
'model.decoder.layers.8': 2,
'model.decoder.layers.9': 2,
'model.encoder.embed_positions': 0,
'model.encoder.embed_tokens': 0,
'model.encoder.layer_norm': 1,
'model.encoder.layers.0': 0,
'model.encoder.layers.1': 0,
'model.encoder.layers.10': 1,
'model.encoder.layers.11': 1,
'model.encoder.layers.12': 1,
'model.encoder.layers.13': 1,
'model.encoder.layers.14': 1,
'model.encoder.layers.15': 1,
'model.encoder.layers.16': 1,
'model.encoder.layers.17': 1,
'model.encoder.layers.18': 1,
'model.encoder.layers.19': 1,
'model.encoder.layers.2': 0,
'model.encoder.layers.20': 1,
'model.encoder.layers.21': 1,
'model.encoder.layers.22': 1,
'model.encoder.layers.23': 1,
'model.encoder.layers.3': 1,
'model.encoder.layers.4': 1,
'model.encoder.layers.5': 1,
'model.encoder.layers.6': 1,
'model.encoder.layers.7': 1,
'model.encoder.layers.8': 1,
'model.encoder.layers.9': 1,
'model.shared': 0}
It could be because I'm moving all inputs to device 0, but if I were to remove the
for i in inputs:
if torch.is_tensor(inputs[i]):
inputs[i] = inputs[i].to("cuda:0")
block. I get
RuntimeError: Expected all tensors to be on the same device, but found at least two devices,
cuda:1 and cpu!
~Hey thanks for reporting! From the look of it, it seems like this is an accelerate
issue rather than a transformer issue. (as accelerate should be moving the layers to the correct device on its own, and no_split modules does not support individual layers to be on the same module). Could you open an issue over there? ๐~
edit: I got confused by the only 2 layers that you had to put on another device, @younesbelkada explained offline what he think should fix it!
I don't see where the error in Accelerate lies. No layers that is not supposed to be split has been split. So the issue is definitely a Transformers one.
Yeah I think it is definitely something that has to do with no split modules not correctly set. Having a look now
@liyier90
I made https://github.com/huggingface/transformers/pull/23758 that should fix your issue.
Also make sure to put the input ids on the same device as your lm head. Otherwise you will get device mismatch issues in generate
.
The snippet I used is the one below, on a 2xNVIDIA A100 80GB:
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_name = "facebook/nllb-moe-54b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
load_in_8bit=True,
)
batched_input = [
'We now have 4-month-old mice that are non-diabetic that used to be diabetic," he added.',
"Dr. Ehud Ur, professor of medicine at Dalhousie University in Halifax, Nova Scotia and chair of the clinical and scientific division of the Canadian Diabetes Association cautioned that the research is still in its early days."
"Like some other experts, he is skeptical about whether diabetes can be cured, noting that these findings have no relevance to people who already have Type 1 diabetes."
"On Monday, Sara Danius, permanent secretary of the Nobel Committee for Literature at the Swedish Academy, publicly announced during a radio program on Sveriges Radio in Sweden the committee, unable to reach Bob Dylan directly about winning the 2016 Nobel Prize in Literature, had abandoned its efforts to reach him.",
'Danius said, "Right now we are doing nothing. I have called and sent emails to his closest collaborator and received very friendly replies. For now, that is certainly enough."',
"Previously, Ring's CEO, Jamie Siminoff, remarked the company started when his doorbell wasn't audible from his shop in his garage.",
]
inputs = tokenizer(batched_input, return_tensors="pt", padding=True).to(1)
translated_tokens = model.generate(
**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["fra_Latn"]
)
outputs = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)
print(outputs)
I had to assign the input to the device 1 because in my case the lm head was on the device 1. But you can retrieve it with
lm_head_device = model.hf_device_map["lm_head"]
And the result I get is:
['Nous avons maintenant des souris de 4 mois qui ne sont pas diabรฉtiques mais qui l\'รฉtaient", a-t-il ajoutรฉ.', "Le Dr Ehud Ur, professeur de mรฉdecine ร l'Universitรฉ Dalhousie ร Halifax, en Nouvelle-รcosse, et prรฉsident de la division clinique et scientifique de l'Association canadienne du diabรจte, a averti que la recherche en รฉtait encore ร ses dรฉbuts. Comme d'autres experts, il est sceptique quant ร la possibilitรฉ de guรฉrir le diabรจte, notant que ces rรฉsultats n'ont aucune pertinence pour les personnes atteintes de diabรจte de type 1.", 'Danius a dรฉclarรฉ: "Pour le moment, nous ne faisons rien. J\'ai appelรฉ et envoyรฉ des courriels ร son plus proche collaborateur et j\'ai reรงu des rรฉponses trรจs amicales. Pour l\'instant, c\'est certainement suffisant".', "Auparavant, le PDG de Ring, Jamie Siminoff, a dรฉclarรฉ que la sociรฉtรฉ avait commencรฉ lorsque sa sonnette n'รฉtait pas audible depuis son magasin dans son garage."]
@younesbelkada
Unfortunately, I don't think changes in the PR was sufficient to resolve the error.
I updated transformers
to include the fix using
pip install git+https://github.com/huggingface/transformers
The latest commit on the main
branch was https://github.com/huggingface/transformers/commit/f67dac97bdc63874f2288546b3fa87e69d2ea1c8.
I ran code snippet you provided but on 4 x A100 40GB as I do not have access to 80 GB cards. I made the modification to move the input to the same device as lm_head
based on your advice.
import os
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_name = "facebook/nllb-moe-54b"
cache_dir = <path>
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir)
model = AutoModelForSeq2SeqLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
load_in_8bit=True,
cache_dir=cache_dir,
)
batched_input = [
'We now have 4-month-old mice that are non-diabetic that used to be diabetic," he added.',
"Dr. Ehud Ur, professor of medicine at Dalhousie University in Halifax, Nova Scotia and chair of the clinical and scientific division of the Canadian Diabetes Association cautioned that the research is still in its early days."
"Like some other experts, he is skeptical about whether diabetes can be cured, noting that these findings have no relevance to people who already have Type 1 diabetes."
"On Monday, Sara Danius, permanent secretary of the Nobel Committee for Literature at the Swedish Academy, publicly announced during a radio program on Sveriges Radio in Sweden the committee, unable to reach Bob Dylan directly about winning the 2016 Nobel Prize in Literature, had abandoned its efforts to reach him.",
'Danius said, "Right now we are doing nothing. I have called and sent emails to his closest collaborator and received very friendly replies. For now, that is certainly enough."',
"Previously, Ring's CEO, Jamie Siminoff, remarked the company started when his doorbell wasn't audible from his shop in his garage.",
]
inputs = tokenizer(batched_input, return_tensors="pt", padding=True).to(
model.hf_device_map["lm_head"]
)
translated_tokens = model.generate(
**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["fra_Latn"]
)
outputs = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)
print(outputs)
But I am still getting an "Expected all tensors to be on the same device" error.
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Traceback (most recent call last) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ /home/users/nus/yier/code/nscc_working/engr/multi_node/nllb_inference/sample_infer.py:31 in โ
โ <module> โ
โ โ
โ 28 โ model.hf_device_map["lm_head"] โ
โ 29 ) โ
โ 30 โ
โ โฑ 31 translated_tokens = model.generate( โ
โ 32 โ **inputs, forced_bos_token_id=tokenizer.lang_code_to_id["fra_Latn"] โ
โ 33 ) โ
โ 34 outputs = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True) โ
โ โ
โ /home/users/nus/yier/.conda/envs/megatron/lib/python3.8/site-packages/torch/utils/_contextlib.py โ
โ :115 in decorate_context โ
โ โ
โ 112 โ @functools.wraps(func) โ
โ 113 โ def decorate_context(*args, **kwargs): โ
โ 114 โ โ with ctx_factory(): โ
โ โฑ 115 โ โ โ return func(*args, **kwargs) โ
โ 116 โ โ
โ 117 โ return decorate_context โ
โ 118 โ
โ โ
โ /home/users/nus/yier/.conda/envs/megatron/lib/python3.8/site-packages/transformers/generation/ut โ
โ ils.py:1518 in generate โ
โ โ
โ 1515 โ โ โ โ ) โ
โ 1516 โ โ โ โ
โ 1517 โ โ โ # 11. run greedy search โ
โ โฑ 1518 โ โ โ return self.greedy_search( โ
โ 1519 โ โ โ โ input_ids, โ
โ 1520 โ โ โ โ logits_processor=logits_processor, โ
โ 1521 โ โ โ โ stopping_criteria=stopping_criteria, โ
โ โ
โ /home/users/nus/yier/.conda/envs/megatron/lib/python3.8/site-packages/transformers/generation/ut โ
โ ils.py:2375 in greedy_search โ
โ โ
โ 2372 โ โ โ if eos_token_id is not None: โ
โ 2373 โ โ โ โ if pad_token_id is None: โ
โ 2374 โ โ โ โ โ raise ValueError("If `eos_token_id` is defined, make sure that `pad_ โ
โ โฑ 2375 โ โ โ โ next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - u โ
โ 2376 โ โ โ โ
โ 2377 โ โ โ # update generated ids, model inputs, and length for next step โ
โ 2378 โ โ โ input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1) โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!
I notice that one of the layers I moved in my earlier snippets (model.encoder.layer_norm
) was on cuda:2
.
{'lm_head': 0,
'model.decoder.embed_positions': 2,
'model.decoder.embed_tokens': 2,
'model.decoder.layer_norm': 3,
'model.decoder.layers.0': 2,
'model.decoder.layers.1': 2,
'model.decoder.layers.10': 3,
'model.decoder.layers.11': 3,
'model.decoder.layers.12': 3,
'model.decoder.layers.13': 3,
'model.decoder.layers.14': 3,
'model.decoder.layers.15': 3,
'model.decoder.layers.16': 3,
'model.decoder.layers.17': 3,
'model.decoder.layers.18': 3,
'model.decoder.layers.19': 3,
'model.decoder.layers.2': 2,
'model.decoder.layers.20': 3,
'model.decoder.layers.21': 3,
'model.decoder.layers.22': 3,
'model.decoder.layers.23': 3,
'model.decoder.layers.3': 2,
'model.decoder.layers.4': 2,
'model.decoder.layers.5': 2,
'model.decoder.layers.6': 2,
'model.decoder.layers.7': 3,
'model.decoder.layers.8': 3,
'model.decoder.layers.9': 3,
'model.encoder.embed_positions': 0,
'model.encoder.embed_tokens': 0,
'model.encoder.layer_norm': 2,
'model.encoder.layers.0': 0,
'model.encoder.layers.1': 0,
'model.encoder.layers.10': 1,
'model.encoder.layers.11': 1,
'model.encoder.layers.12': 1,
'model.encoder.layers.13': 1,
'model.encoder.layers.14': 1,
'model.encoder.layers.15': 1,
'model.encoder.layers.16': 1,
'model.encoder.layers.17': 1,
'model.encoder.layers.18': 1,
'model.encoder.layers.19': 2,
'model.encoder.layers.2': 0,
'model.encoder.layers.20': 2,
'model.encoder.layers.21': 2,
'model.encoder.layers.22': 2,
'model.encoder.layers.23': 2,
'model.encoder.layers.3': 0,
'model.encoder.layers.4': 0,
'model.encoder.layers.5': 0,
'model.encoder.layers.6': 0,
'model.encoder.layers.7': 1,
'model.encoder.layers.8': 1,
'model.encoder.layers.9': 1,
'model.shared': 0}
The code ran successfully after I moved model.encoder.layer_norm
to cuda:0
while keeping the other device mapping untouched.
Please let me know if I made any mistakes in trying out your solution or if I should be raising this in the Accelerate repo instead. Thanks!
I am having the same issues. I installed transformers after the fix and I get RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
Unfortunately I only have 3 A100 40gb gpus that I can use.
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
model_name = "nllb_image/nllb-moe-54b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name,
torch_dtype=torch.float16,
device_map = 'auto',
load_in_8bit=True,)
inputs = tokenizer("test", return_tensors="pt").to(model.hf_device_map["lm_head"])
translated_tokens = model.generate(
**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["fr_Latn"], max_length=512
)
decoded_sentence = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
return decoded_sentence
expected result: translated "test" (french)
actual result: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
Am I doing anything wrong?
{
"model.shared":0,
"lm_head":0,
"model.encoder.embed_tokens":0,
"model.encoder.embed_positions":0,
"model.encoder.layers.0":0,
"model.encoder.layers.1":0,
"model.encoder.layers.2":0,
"model.encoder.layers.3":0,
"model.encoder.layers.4":0,
"model.encoder.layers.5":0,
"model.encoder.layers.6":0,
"model.encoder.layers.7":0,
"model.encoder.layers.8":0,
"model.encoder.layers.9":0,
"model.encoder.layers.10":0,
"model.encoder.layers.11":0,
"model.encoder.layers.12":0,
"model.encoder.layers.13":0,
"model.encoder.layers.14":0,
"model.encoder.layers.15.self_attn":0,
"model.encoder.layers.15.attn_dropout":0,
"model.encoder.layers.15.self_attn_layer_norm":0,
"model.encoder.layers.15.ffn.router":0,
"model.encoder.layers.15.ffn.token_dropout":0,
"model.encoder.layers.15.ffn.experts.expert_0":0,
"model.encoder.layers.15.ffn.experts.expert_1":0,
"model.encoder.layers.15.ffn.experts.expert_2":0,
"model.encoder.layers.15.ffn.experts.expert_3":0,
"model.encoder.layers.15.ffn.experts.expert_4":0,
"model.encoder.layers.15.ffn.experts.expert_5":0,
"model.encoder.layers.15.ffn.experts.expert_6":0,
"model.encoder.layers.15.ffn.experts.expert_7":0,
"model.encoder.layers.15.ffn.experts.expert_8":0,
"model.encoder.layers.15.ffn.experts.expert_9":0,
"model.encoder.layers.15.ffn.experts.expert_10":0,
"model.encoder.layers.15.ffn.experts.expert_11":0,
"model.encoder.layers.15.ffn.experts.expert_12":0,
"model.encoder.layers.15.ffn.experts.expert_13":0,
"model.encoder.layers.15.ffn.experts.expert_14":0,
"model.encoder.layers.15.ffn.experts.expert_15":0,
"model.encoder.layers.15.ffn.experts.expert_16":0,
"model.encoder.layers.15.ffn.experts.expert_17":0,
"model.encoder.layers.15.ffn.experts.expert_18":0,
"model.encoder.layers.15.ffn.experts.expert_19":0,
"model.encoder.layers.15.ffn.experts.expert_20":0,
"model.encoder.layers.15.ffn.experts.expert_21":0,
"model.encoder.layers.15.ffn.experts.expert_22":0,
"model.encoder.layers.15.ffn.experts.expert_23":0,
"model.encoder.layers.15.ffn.experts.expert_24":0,
"model.encoder.layers.15.ffn.experts.expert_25":0,
"model.encoder.layers.15.ffn.experts.expert_26":0,
"model.encoder.layers.15.ffn.experts.expert_27":0,
"model.encoder.layers.15.ffn.experts.expert_28":0,
"model.encoder.layers.15.ffn.experts.expert_29":0,
"model.encoder.layers.15.ffn.experts.expert_30":0,
"model.encoder.layers.15.ffn.experts.expert_31":0,
"model.encoder.layers.15.ffn.experts.expert_32":0,
"model.encoder.layers.15.ffn.experts.expert_33":0,
"model.encoder.layers.15.ffn.experts.expert_34":0,
"model.encoder.layers.15.ffn.experts.expert_35":0,
"model.encoder.layers.15.ffn.experts.expert_36":0,
"model.encoder.layers.15.ffn.experts.expert_37":0,
"model.encoder.layers.15.ffn.experts.expert_38":0,
"model.encoder.layers.15.ffn.experts.expert_39":0,
"model.encoder.layers.15.ffn.experts.expert_40":0,
"model.encoder.layers.15.ffn.experts.expert_41":0,
"model.encoder.layers.15.ffn.experts.expert_42":0,
"model.encoder.layers.15.ffn.experts.expert_43":0,
"model.encoder.layers.15.ffn.experts.expert_44":0,
"model.encoder.layers.15.ffn.experts.expert_45":0,
"model.encoder.layers.15.ffn.experts.expert_46":0,
"model.encoder.layers.15.ffn.experts.expert_47":0,
"model.encoder.layers.15.ffn.experts.expert_48":0,
"model.encoder.layers.15.ffn.experts.expert_49":0,
"model.encoder.layers.15.ffn.experts.expert_50":0,
"model.encoder.layers.15.ffn.experts.expert_51":0,
"model.encoder.layers.15.ffn.experts.expert_52":0,
"model.encoder.layers.15.ffn.experts.expert_53":0,
"model.encoder.layers.15.ffn.experts.expert_54":0,
"model.encoder.layers.15.ffn.experts.expert_55":0,
"model.encoder.layers.15.ffn.experts.expert_56":0,
"model.encoder.layers.15.ffn.experts.expert_57":0,
"model.encoder.layers.15.ffn.experts.expert_58":0,
"model.encoder.layers.15.ffn.experts.expert_59":0,
"model.encoder.layers.15.ffn.experts.expert_60":0,
"model.encoder.layers.15.ffn.experts.expert_61":0,
"model.encoder.layers.15.ffn.experts.expert_62":0,
"model.encoder.layers.15.ffn.experts.expert_63":0,
"model.encoder.layers.15.ffn.experts.expert_64":0,
"model.encoder.layers.15.ffn.experts.expert_65":0,
"model.encoder.layers.15.ffn.experts.expert_66":0,
"model.encoder.layers.15.ffn.experts.expert_67":0,
"model.encoder.layers.15.ffn.experts.expert_68":0,
"model.encoder.layers.15.ffn.experts.expert_69":0,
"model.encoder.layers.15.ffn.experts.expert_70":0,
"model.encoder.layers.15.ffn.experts.expert_71":0,
"model.encoder.layers.15.ffn.experts.expert_72":0,
"model.encoder.layers.15.ffn.experts.expert_73":0,
"model.encoder.layers.15.ffn.experts.expert_74":0,
"model.encoder.layers.15.ffn.experts.expert_75":0,
"model.encoder.layers.15.ffn.experts.expert_76":0,
"model.encoder.layers.15.ffn.experts.expert_77":0,
"model.encoder.layers.15.ffn.experts.expert_78":0,
"model.encoder.layers.15.ffn.experts.expert_79":0,
"model.encoder.layers.15.ffn.experts.expert_80":0,
"model.encoder.layers.15.ffn.experts.expert_81":0,
"model.encoder.layers.15.ffn.experts.expert_82":0,
"model.encoder.layers.15.ffn.experts.expert_83":0,
"model.encoder.layers.15.ffn.experts.expert_84":0,
"model.encoder.layers.15.ffn.experts.expert_85":0,
"model.encoder.layers.15.ffn.experts.expert_86":0,
"model.encoder.layers.15.ffn.experts.expert_87":0,
"model.encoder.layers.15.ffn.experts.expert_88":0,
"model.encoder.layers.15.ffn.experts.expert_89":0,
"model.encoder.layers.15.ffn.experts.expert_90":0,
"model.encoder.layers.15.ffn.experts.expert_91":0,
"model.encoder.layers.15.ffn.experts.expert_92":0,
"model.encoder.layers.15.ffn.experts.expert_93":0,
"model.encoder.layers.15.ffn.experts.expert_94":0,
"model.encoder.layers.15.ffn.experts.expert_95":0,
"model.encoder.layers.15.ffn.experts.expert_96":0,
"model.encoder.layers.15.ffn.experts.expert_97":0,
"model.encoder.layers.15.ffn.experts.expert_98":0,
"model.encoder.layers.15.ffn.experts.expert_99":0,
"model.encoder.layers.15.ffn.experts.expert_100":0,
"model.encoder.layers.15.ffn.experts.expert_102":1,
"model.encoder.layers.15.ffn.experts.expert_103":1,
"model.encoder.layers.15.ffn.experts.expert_104":1,
"model.encoder.layers.15.ffn.experts.expert_105":1,
"model.encoder.layers.15.ffn.experts.expert_106":1,
"model.encoder.layers.15.ffn.experts.expert_107":1,
"model.encoder.layers.15.ffn.experts.expert_108":1,
"model.encoder.layers.15.ffn.experts.expert_109":1,
"model.encoder.layers.15.ffn.experts.expert_110":1,
"model.encoder.layers.15.ffn.experts.expert_111":1,
"model.encoder.layers.15.ffn.experts.expert_112":1,
"model.encoder.layers.15.ffn.experts.expert_113":1,
"model.encoder.layers.15.ffn.experts.expert_114":1,
"model.encoder.layers.15.ffn.experts.expert_115":1,
"model.encoder.layers.15.ffn.experts.expert_116":1,
"model.encoder.layers.15.ffn.experts.expert_117":1,
"model.encoder.layers.15.ffn.experts.expert_118":1,
"model.encoder.layers.15.ffn.experts.expert_119":1,
"model.encoder.layers.15.ffn.experts.expert_120":1,
"model.encoder.layers.15.ffn.experts.expert_121":1,
"model.encoder.layers.15.ffn.experts.expert_122":1,
"model.encoder.layers.15.ffn.experts.expert_123":1,
"model.encoder.layers.15.ffn.experts.expert_124":1,
"model.encoder.layers.15.ffn.experts.expert_125":1,
"model.encoder.layers.15.ffn.experts.expert_126":1,
"model.encoder.layers.15.ffn.experts.expert_127":1,
"model.encoder.layers.15.ff_layer_norm":1,
"model.encoder.layers.15.ff_dropout":1,
"model.encoder.layers.16":1,
"model.encoder.layers.17":1,
"model.encoder.layers.18":1,
"model.encoder.layers.19":1,
"model.encoder.layers.20":1,
"model.encoder.layers.21":1,
"model.encoder.layers.22":1,
"model.encoder.layers.23":1,
"model.encoder.layer_norm":1,
"model.decoder.embed_tokens":1,
"model.decoder.embed_positions":1,
"model.decoder.layers.0":1,
"model.decoder.layers.1":1,
"model.decoder.layers.2":1,
"model.decoder.layers.3":1,
"model.decoder.layers.4":1,
"model.decoder.layers.5":1,
"model.decoder.layers.6":1,
"model.decoder.layers.7.self_attn":1,
"model.decoder.layers.7.activation_fn":1,
"model.decoder.layers.7.attn_dropout":1,
"model.decoder.layers.7.self_attn_layer_norm":1,
"model.decoder.layers.7.cross_attention":1,
"model.decoder.layers.7.cross_attention_layer_norm":1,
"model.decoder.layers.7.ffn.router":1,
"model.decoder.layers.7.ffn.token_dropout":1,
"model.decoder.layers.7.ffn.experts.expert_0":1,
"model.decoder.layers.7.ffn.experts.expert_1":1,
"model.decoder.layers.7.ffn.experts.expert_2":1,
"model.decoder.layers.7.ffn.experts.expert_3":1,
"model.decoder.layers.7.ffn.experts.expert_4":1,
"model.decoder.layers.7.ffn.experts.expert_5":1,
"model.decoder.layers.7.ffn.experts.expert_6":1,
"model.decoder.layers.7.ffn.experts.expert_7":1,
"model.decoder.layers.7.ffn.experts.expert_8":1,
"model.decoder.layers.7.ffn.experts.expert_9":1,
"model.decoder.layers.7.ffn.experts.expert_10":1,
"model.decoder.layers.7.ffn.experts.expert_11":1,
"model.decoder.layers.7.ffn.experts.expert_12":1,
"model.decoder.layers.7.ffn.experts.expert_13":1,
"model.decoder.layers.7.ffn.experts.expert_14":1,
"model.decoder.layers.7.ffn.experts.expert_15":1,
"model.decoder.layers.7.ffn.experts.expert_16":1,
"model.decoder.layers.7.ffn.experts.expert_17":1,
"model.decoder.layers.7.ffn.experts.expert_18":1,
"model.decoder.layers.7.ffn.experts.expert_19":1,
"model.decoder.layers.7.ffn.experts.expert_20":1,
"model.decoder.layers.7.ffn.experts.expert_21":1,
"model.decoder.layers.7.ffn.experts.expert_22":1,
"model.decoder.layers.7.ffn.experts.expert_23":1,
"model.decoder.layers.7.ffn.experts.expert_24":1,
"model.decoder.layers.7.ffn.experts.expert_25":1,
"model.decoder.layers.7.ffn.experts.expert_26":1,
"model.decoder.layers.7.ffn.experts.expert_27":1,
"model.decoder.layers.7.ffn.experts.expert_28":1,
"model.decoder.layers.7.ffn.experts.expert_29":1,
"model.decoder.layers.7.ffn.experts.expert_30":1,
"model.decoder.layers.7.ffn.experts.expert_31":1,
"model.decoder.layers.7.ffn.experts.expert_32":1,
"model.decoder.layers.7.ffn.experts.expert_33":1,
"model.decoder.layers.7.ffn.experts.expert_34":1,
"model.decoder.layers.7.ffn.experts.expert_35":1,
"model.decoder.layers.7.ffn.experts.expert_36":1,
"model.decoder.layers.7.ffn.experts.expert_37":1,
"model.decoder.layers.7.ffn.experts.expert_38":1,
"model.decoder.layers.7.ffn.experts.expert_39":1,
"model.decoder.layers.7.ffn.experts.expert_40":1,
"model.decoder.layers.7.ffn.experts.expert_41":1,
"model.decoder.layers.7.ffn.experts.expert_42":1,
"model.decoder.layers.7.ffn.experts.expert_43":1,
"model.decoder.layers.7.ffn.experts.expert_44":1,
"model.decoder.layers.7.ffn.experts.expert_45":1,
"model.decoder.layers.7.ffn.experts.expert_46":1,
"model.decoder.layers.7.ffn.experts.expert_47":1,
"model.decoder.layers.7.ffn.experts.expert_48":1,
"model.decoder.layers.7.ffn.experts.expert_49":1,
"model.decoder.layers.7.ffn.experts.expert_50":1,
"model.decoder.layers.7.ffn.experts.expert_51":1,
"model.decoder.layers.7.ffn.experts.expert_52":1,
"model.decoder.layers.7.ffn.experts.expert_53":1,
"model.decoder.layers.7.ffn.experts.expert_54":1,
"model.decoder.layers.7.ffn.experts.expert_55":1,
"model.decoder.layers.7.ffn.experts.expert_56":1,
"model.decoder.layers.7.ffn.experts.expert_57":1,
"model.decoder.layers.7.ffn.experts.expert_58":1,
"model.decoder.layers.7.ffn.experts.expert_59":1,
"model.decoder.layers.7.ffn.experts.expert_60":1,
"model.decoder.layers.7.ffn.experts.expert_61":1,
"model.decoder.layers.7.ffn.experts.expert_62":1,
"model.decoder.layers.7.ffn.experts.expert_63":1,
"model.decoder.layers.7.ffn.experts.expert_64":1,
"model.decoder.layers.7.ffn.experts.expert_65":1,
"model.decoder.layers.7.ffn.experts.expert_66":1,
"model.decoder.layers.7.ffn.experts.expert_67":1,
"model.decoder.layers.7.ffn.experts.expert_68":1,
"model.decoder.layers.7.ffn.experts.expert_69":1,
"model.decoder.layers.7.ffn.experts.expert_70":1,
"model.decoder.layers.7.ffn.experts.expert_71":1,
"model.decoder.layers.7.ffn.experts.expert_72":1,
"model.decoder.layers.7.ffn.experts.expert_73":1,
"model.decoder.layers.7.ffn.experts.expert_74":1,
"model.decoder.layers.7.ffn.experts.expert_75":1,
"model.decoder.layers.7.ffn.experts.expert_76":1,
"model.decoder.layers.7.ffn.experts.expert_77":1,
"model.decoder.layers.7.ffn.experts.expert_78":1,
"model.decoder.layers.7.ffn.experts.expert_79":1,
"model.decoder.layers.7.ffn.experts.expert_80":1,
"model.decoder.layers.7.ffn.experts.expert_81":1,
"model.decoder.layers.7.ffn.experts.expert_82":1,
"model.decoder.layers.7.ffn.experts.expert_83":1,
"model.decoder.layers.7.ffn.experts.expert_84":1,
"model.decoder.layers.7.ffn.experts.expert_85":1,
"model.decoder.layers.7.ffn.experts.expert_86":1,
"model.decoder.layers.7.ffn.experts.expert_87":1,
"model.decoder.layers.7.ffn.experts.expert_88":1,
"model.decoder.layers.7.ffn.experts.expert_89":1,
"model.decoder.layers.7.ffn.experts.expert_90":1,
"model.decoder.layers.7.ffn.experts.expert_91":1,
"model.decoder.layers.7.ffn.experts.expert_92":1,
"model.decoder.layers.7.ffn.experts.expert_93":1,
"model.decoder.layers.7.ffn.experts.expert_94":1,
"model.decoder.layers.7.ffn.experts.expert_95":1,
"model.decoder.layers.7.ffn.experts.expert_96":1,
"model.decoder.layers.7.ffn.experts.expert_97":1,
"model.decoder.layers.7.ffn.experts.expert_98":1,
"model.decoder.layers.7.ffn.experts.expert_99":1,
"model.decoder.layers.7.ffn.experts.expert_100":1,
"model.decoder.layers.7.ffn.experts.expert_101":1,
"model.decoder.layers.7.ffn.experts.expert_102":1,
"model.decoder.layers.7.ffn.experts.expert_103":1,
"model.decoder.layers.7.ffn.experts.expert_104":1,
"model.decoder.layers.7.ffn.experts.expert_105":1,
"model.decoder.layers.7.ffn.experts.expert_106":1,
"model.decoder.layers.7.ffn.experts.expert_107":1,
"model.decoder.layers.7.ffn.experts.expert_108":1,
"model.decoder.layers.7.ffn.experts.expert_109":1,
"model.decoder.layers.7.ffn.experts.expert_110":1,
"model.decoder.layers.7.ffn.experts.expert_111":1,
"model.decoder.layers.7.ffn.experts.expert_112":1,
"model.decoder.layers.7.ffn.experts.expert_113":1,
"model.decoder.layers.7.ffn.experts.expert_114":1,
"model.decoder.layers.7.ffn.experts.expert_115":1,
"model.decoder.layers.7.ffn.experts.expert_116":1,
"model.decoder.layers.7.ffn.experts.expert_118":2,
"model.decoder.layers.7.ffn.experts.expert_119":2,
"model.decoder.layers.7.ffn.experts.expert_120":2,
"model.decoder.layers.7.ffn.experts.expert_121":2,
"model.decoder.layers.7.ffn.experts.expert_122":2,
"model.decoder.layers.7.ffn.experts.expert_123":2,
"model.decoder.layers.7.ffn.experts.expert_124":2,
"model.decoder.layers.7.ffn.experts.expert_125":2,
"model.decoder.layers.7.ffn.experts.expert_126":2,
"model.decoder.layers.7.ffn.experts.expert_127":2,
"model.decoder.layers.7.ff_layer_norm":2,
"model.decoder.layers.7.ff_dropout":2,
"model.decoder.layers.8":2,
"model.decoder.layers.9":2,
"model.decoder.layers.10":2,
"model.decoder.layers.11":2,
"model.decoder.layers.12":2,
"model.decoder.layers.13":2,
"model.decoder.layers.14":2,
"model.decoder.layers.15":2,
"model.decoder.layers.16":2,
"model.decoder.layers.17":2,
"model.decoder.layers.18":2,
"model.decoder.layers.19":2,
"model.decoder.layers.20":2,
"model.decoder.layers.21":2,
"model.decoder.layers.22":2,
"model.decoder.layers.23":2,
"model.decoder.layer_norm":2,
"model.encoder.layers.15.ffn.experts.expert_101":1,
"model.decoder.layers.7.ffn.experts.expert_117":2
}
The same issue here
cc @SunMarc ๐
Hi, I found the issue. In the meantime, the hack is to have the input on the same device as model.encoder.layer_norm
. I will fix this in a PR asap.
@SunMarc Could you please check my problem? I try multi-GPU finetuning of NLLB-200-1.3B. I tried your recent encoder hook #25735, but it didn't help me and "Expected all tensors to be on the same device" error takes place again.
Hi @molokanov50 .Please open a new issue as this is not linked to this issue which was about encoder decoder model in general, not specific to nllb model. Also, provide a minimal reproductible script so that I can try to reproduce the error on my side. For now the following script works as expected:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-1.3B")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-1.3B", device_map="auto")
input = 'We now have 4-month-old mice that are non-diabetic that used to be diabetic," he added.'
input = tokenizer(input, return_tensors="pt")
translated_tokens = model.generate(
**input, forced_bos_token_id=tokenizer.lang_code_to_id["fra_Latn"]
)
print(tokenizer.decode(translated_tokens[0], skip_special_tokens=True))
System Info
transformers
version: 4.28.1Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Note: there is a workaround/fix with manual device mapping attached below but I'm wondering if there could be an official fix for the bug.
Code sample
infer.py (Mostly from the HF Hub sample with some modifications to load with multi-GPU and quantization)
Steps:
CUDA_VISIBLE_DEVICES=0,1,2,3 python infer.py
Expected behavior
A list of translated text.
The following code contains a workaround to prevent certain module splits and moves certain modules to the same device as the input in order to run the inference without errors.
Code
Output: