meta-llama / llama-recipes

Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
15.29k stars 2.21k forks source link

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! error for chatcompletion.py with llama 3.2 instruct model #771

Open Emersonksc opened 2 weeks ago

Emersonksc commented 2 weeks ago

System Info

ubuntu 22.04 torch 2.5.0 cuda 12.4 running on a single gpu with CUDA_VISIBLE_DEVICES=1 image

Information

πŸ› Describe the bug

python recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py --model_name "/home/emerson/AI/LLM/models/llama/Llama-3.2-3B-Instruct" --prompt_file "recipes/quickstart/inference/local_inference/chat_completion/girlfriend_chat_completion.json" --max_new_tokens 20 --enable_saleforce_content_safety False

Error logs

error: File "/home/emerson/AI/LLM/recipe/llama-recipes/recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py", line 141, in fire.Fire(main) File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/fire/core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace component = fn(*varargs, kwargs) File "/home/emerson/AI/LLM/recipe/llama-recipes/recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py", line 107, in main outputs = model.generate( File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, kwargs) File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/generation/utils.py", line 2215, in generate result = self._sample( File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/generation/utils.py", line 3206, in _sample outputs = self(model_inputs, return_dict=True) File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, kwargs) File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1190, in forward outputs = self.model( File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(args, kwargs) File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 921, in forward position_embeddings = self.rotary_emb(hidden_states, position_ids) File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, *kwargs) File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, kwargs) File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 158, in forward _freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDAbmm)

Expected behavior

run chat_completion.py with llama3.2 instruct models

Emersonksc commented 2 weeks ago

by the way, the script worked fine with llama 3 8b instruct, I assume the model matters

wukaixingxp commented 2 weeks ago

@Emersonksc Thanks for your report on this bug. However I can not reproduce this, can you double check you llama-recipe version? Here is the log, please take a look:

~/work/llama-recipes (main)]$ python recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py --model_name meta-llama/Llama-3.2-3B-Instruct --prompt_file recipes/quickstart/inference/local_inference/chat_completion/  --prompt_file recipes/quickstart/inference/local_inference/chat_completion/chats.json  --max_new_tokens 20 --enable_saleforce_content_safety False
/home/kaiwu/work/llama-recipes/src/llama_recipes/model_checkpointing/checkpoint_handler.py:17: DeprecationWarning: `torch.distributed._shard.checkpoint` will be deprecated, use `torch.distributed.checkpoint` instead
  from torch.distributed._shard.checkpoint import (
User dialogs:
[[{'role': 'user', 'content': 'what is the recipe of mayonnaise?'}], [{'role': 'user', 'content': 'I am going to Paris, what should I see?'}, {'role': 'assistant', 'content': "Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world."}, {'role': 'user', 'content': 'What is so great about #1?'}], [{'role': 'system', 'content': 'Always answer with Haiku'}, {'role': 'user', 'content': 'I am going to Paris, what should I see?'}], [{'role': 'system', 'content': 'Always answer with emojis'}, {'role': 'user', 'content': 'How to go from Beijing to NY?'}], [{'role': 'system', 'content': "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."}, {'role': 'user', 'content': 'Write a brief birthday message to John'}]]

==================================

use_fast_kernelsFalse
config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 878/878 [00:00<00:00, 7.18MB/s]
model.safetensors.index.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 20.9k/20.9k [00:00<00:00, 60.5MB/s]
model-00001-of-00002.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.97G/4.97G [01:57<00:00, 42.2MB/s]
model-00002-of-00002.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.46G/1.46G [00:34<00:00, 42.5MB/s]
Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [02:32<00:00, 76.19s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:03<00:00,  1.54s/it]
generation_config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 189/189 [00:00<00:00, 1.54MB/s]
tokenizer_config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 54.5k/54.5k [00:00<00:00, 19.0MB/s]
tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9.09M/9.09M [00:00<00:00, 30.1MB/s]
special_tokens_map.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 296/296 [00:00<00:00, 2.43MB/s]
User prompt deemed safe.
User prompt:
 what is the recipe of mayonnaise?

==================================

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
User input and model output deemed safe.
Model output:
system

Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024

user

what is the recipe of mayonnaise?assistant

The classic recipe for mayonnaise is a bit of a tricky process, but

==================================

User prompt deemed safe.
User prompt:
 I am going to Paris, what should I see?

==================================

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
User input and model output deemed safe.
Model output:
system

Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024

user

I am going to Paris, what should I see?assistant

Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.user

What is so great about #1?assistant

The Eiffel Tower is an iconic symbol of Paris and one of the most

==================================

User prompt deemed safe.
User prompt:
 Always answer with Haiku

==================================

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
User input and model output deemed safe.
Model output:
system

Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024

Always answer with Haikuuser

I am going to Paris, what should I see?assistant

Eiffel Tower high
Louvre's art treasures abide
City's gentle

==================================

User prompt deemed safe.
User prompt:
 Always answer with emojis

==================================

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
User input and model output deemed safe.
Model output:
system

Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024

Always answer with emojisuser

How to go from Beijing to NY?assistant

πŸ—ΊοΈπŸš‚βœˆοΈ:

1. Beijing οΏ½

==================================

User prompt deemed safe.
User prompt:
 You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.

==================================

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
User input and model output deemed safe.
Model output:
system

Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024

You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.user

Write a brief birthday message to Johnassistant

Here's a brief birthday message for John:

"Happy Birthday, John! W

==================================
~/work/llama-recipes (main)]$ pip list | grep llama
llama-recipes                            0.0.4.post1   /home/kaiwu/work/llama-recipes
Emersonksc commented 2 weeks ago

I found when I used the single command, it worked fine, but added the export CUDA_VISIBLE_DEVICES=1, it reported the error.

Emersonksc commented 2 weeks ago

maybe you missed export CUDA_VISIBLE_DEVICES=1

Emersonksc commented 2 weeks ago

@wukaixingxp