yanguangcang2019 commented 4 months ago

(llama3_env) root@cuda22:~/llama3# torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir /root/llama3/Meta-Llama-3-8B/ --tokenizer_path /root/llama3/Meta-Llama-3-8B/tokenizer.model --max_seq_len 512 --max_batch_size 6

initializing model parallel with size 1 initializing ddp with size 1 initializing pipeline with size 1 [2024-04-19 13:35:09,072] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 0 (pid: 1231906) of binary: /root/llama3_env/bin/python Traceback (most recent call last): File "/root/llama3_env/bin/torchrun", line 8, in sys.exit(main()) File "/root/llama3_env/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper return f(*args, **kwargs) File "/root/llama3_env/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main run(args) File "/root/llama3_env/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run elastic_launch( File "/root/llama3_env/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/root/llama3_env/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

example_chat_completion.py FAILED

Failures:
-------------------------------------------------------- Root Cause (first observed failure): [0]: time : 2024-04-19_13:35:09 host : cuda22 rank : 0 (local_rank: 0) exitcode : -9 (pid: 1231906) error_file: traceback : Signal 9 (SIGKILL) received by PID 1231906 ========================================================

ejsd1989 commented 4 months ago

Hey @yanguangcang2019, thanks for giving this a shot.

Could you please provide more info about your system setup? How much memory is avaialble? I see a SIGKILL, so wondering if there was an out of memory issue?

I assume you've confirmed torch configurations, but it might be good to doublecheck your CUDA version against Pytorch version nvcc --version

yanguangcang2019 commented 4 months ago

Thank you for your response. I have an NVIDIA V4096 graphics card, and my system has 15GB of memory. After executing the command, I monitored the system's memory usage and noticed that an error occurred after the system memory was fully occupied. Is this because the default is to use system memory for inference rather than the graphics card? Would I need to set parameters for graphics card inference? I downloaded and deployed the model from GitHub.

yanguangcang2019 commented 4 months ago

(llama3_env) root@cuda22:~/llama3# nvcc --versionnvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Feb__7_19:32:13_PST_2023 Cuda compilation tools, release 12.1, V12.1.66 Build cuda_12.1.r12.1/compiler.32415258_0

ejsd1989 commented 4 months ago

So, i'm not sure if NVIDIA V4096 is a valid card, but I will trust that it is and that it should work with enough memory. So one thing you can try in order to force processing to your CUDA supported gpu is add the following to the _example_chatcompletion.py:

import torch
torch.set_default_device('cuda')

Once you've done that you should be able to use nvidia-smi to monitor GPU Memory Usage

yanguangcang2019 commented 4 months ago

In the example_text_completion.py file, I added import torch and torch.set_default_device('cuda'), but the same error persists. During the runtime, I used nvidia-smi to monitor GPU memory usage, but there was no increase in GPU memory usage observed. Instead, only the system memory increased until it was exhausted, eventually leading to the error. Code:

from typing import List, Optional

import fire import torch

Set the default device to CUDA (GPU) if available

torch.set_default_device('cuda')

Check if CUDA is available and print the result

if torch.cuda.is_available(): print("CUDA is available. Using GPU.") else: print("CUDA is not available. Using CPU.") Output:

CUDA is available. Using GPU.

yanguangcang2019 commented 4 months ago

But，Try to Run llama3 8b in ollama is ok

noslenwerdna commented 4 months ago

I am having the same problem, with NVIDIA GeForce RTX 3090. Same thing, CPU memory maxes out, but GPU memory is not changing.

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0

utkarsh27a commented 2 months ago

I encountered the same error while using an NVIDIA L4 GPU with Driver Version 555.42.02 and CUDA Version 12.5 on Ubuntu OS.

I resolved the issue by installing the NVIDIA CUDA Toolkit with the following command:

sudo apt install nvidia-cuda-toolkit

Please try this solution and let us know if it resolves the issue for you.

cshanes commented 1 month ago

I had this issue running Llama3 8B-Instruct on a machine with 16gb RAM, 24GB VRAM. I changed the instance to one with 32gb RAM, 24 GB VRAM and it worked.

meta-llama / llama3

run the model locally ，the command error，help me please #74

example_chat_completion.py FAILED

Set the default device to CUDA (GPU) if available

Check if CUDA is available and print the result