Open Emersonksc opened 2 weeks ago
by the way, the script worked fine with llama 3 8b instruct, I assume the model matters
@Emersonksc Thanks for your report on this bug. However I can not reproduce this, can you double check you llama-recipe version? Here is the log, please take a look:
~/work/llama-recipes (main)]$ python recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py --model_name meta-llama/Llama-3.2-3B-Instruct --prompt_file recipes/quickstart/inference/local_inference/chat_completion/ --prompt_file recipes/quickstart/inference/local_inference/chat_completion/chats.json --max_new_tokens 20 --enable_saleforce_content_safety False
/home/kaiwu/work/llama-recipes/src/llama_recipes/model_checkpointing/checkpoint_handler.py:17: DeprecationWarning: `torch.distributed._shard.checkpoint` will be deprecated, use `torch.distributed.checkpoint` instead
from torch.distributed._shard.checkpoint import (
User dialogs:
[[{'role': 'user', 'content': 'what is the recipe of mayonnaise?'}], [{'role': 'user', 'content': 'I am going to Paris, what should I see?'}, {'role': 'assistant', 'content': "Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world."}, {'role': 'user', 'content': 'What is so great about #1?'}], [{'role': 'system', 'content': 'Always answer with Haiku'}, {'role': 'user', 'content': 'I am going to Paris, what should I see?'}], [{'role': 'system', 'content': 'Always answer with emojis'}, {'role': 'user', 'content': 'How to go from Beijing to NY?'}], [{'role': 'system', 'content': "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."}, {'role': 'user', 'content': 'Write a brief birthday message to John'}]]
==================================
use_fast_kernelsFalse
config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 878/878 [00:00<00:00, 7.18MB/s]
model.safetensors.index.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 20.9k/20.9k [00:00<00:00, 60.5MB/s]
model-00001-of-00002.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.97G/4.97G [01:57<00:00, 42.2MB/s]
model-00002-of-00002.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.46G/1.46G [00:34<00:00, 42.5MB/s]
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [02:32<00:00, 76.19s/it]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:03<00:00, 1.54s/it]
generation_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 189/189 [00:00<00:00, 1.54MB/s]
tokenizer_config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 54.5k/54.5k [00:00<00:00, 19.0MB/s]
tokenizer.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9.09M/9.09M [00:00<00:00, 30.1MB/s]
special_tokens_map.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 296/296 [00:00<00:00, 2.43MB/s]
User prompt deemed safe.
User prompt:
what is the recipe of mayonnaise?
==================================
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
User input and model output deemed safe.
Model output:
system
Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024
user
what is the recipe of mayonnaise?assistant
The classic recipe for mayonnaise is a bit of a tricky process, but
==================================
User prompt deemed safe.
User prompt:
I am going to Paris, what should I see?
==================================
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
User input and model output deemed safe.
Model output:
system
Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024
user
I am going to Paris, what should I see?assistant
Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.user
What is so great about #1?assistant
The Eiffel Tower is an iconic symbol of Paris and one of the most
==================================
User prompt deemed safe.
User prompt:
Always answer with Haiku
==================================
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
User input and model output deemed safe.
Model output:
system
Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024
Always answer with Haikuuser
I am going to Paris, what should I see?assistant
Eiffel Tower high
Louvre's art treasures abide
City's gentle
==================================
User prompt deemed safe.
User prompt:
Always answer with emojis
==================================
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
User input and model output deemed safe.
Model output:
system
Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024
Always answer with emojisuser
How to go from Beijing to NY?assistant
πΊοΈπβοΈ:
1. Beijing οΏ½
==================================
User prompt deemed safe.
User prompt:
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
==================================
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
User input and model output deemed safe.
Model output:
system
Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.user
Write a brief birthday message to Johnassistant
Here's a brief birthday message for John:
"Happy Birthday, John! W
==================================
~/work/llama-recipes (main)]$ pip list | grep llama
llama-recipes 0.0.4.post1 /home/kaiwu/work/llama-recipes
I found when I used the single command, it worked fine, but added the export CUDA_VISIBLE_DEVICES=1, it reported the error.
maybe you missed export CUDA_VISIBLE_DEVICES=1
@wukaixingxp
System Info
ubuntu 22.04 torch 2.5.0 cuda 12.4 running on a single gpu with CUDA_VISIBLE_DEVICES=1
Information
π Describe the bug
python recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py --model_name "/home/emerson/AI/LLM/models/llama/Llama-3.2-3B-Instruct" --prompt_file "recipes/quickstart/inference/local_inference/chat_completion/girlfriend_chat_completion.json" --max_new_tokens 20 --enable_saleforce_content_safety False
Error logs
error: File "/home/emerson/AI/LLM/recipe/llama-recipes/recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py", line 141, in
fire.Fire(main)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, kwargs)
File "/home/emerson/AI/LLM/recipe/llama-recipes/recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py", line 107, in main
outputs = model.generate(
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(args, kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/generation/utils.py", line 2215, in generate
result = self._sample(
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/generation/utils.py", line 3206, in _sample
outputs = self(model_inputs, return_dict=True)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1190, in forward
outputs = self.model(
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(args, kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 921, in forward
position_embeddings = self.rotary_emb(hidden_states, position_ids)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, *kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(args, kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 158, in forward
_freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDAbmm)
Expected behavior
run chat_completion.py with llama3.2 instruct models