xNul / code-llama-for-vscode

Use Code Llama with Visual Studio Code and the Continue extension. A local LLM alternative to GitHub Copilot.
MIT License
548 stars 30 forks source link

When I execute “torchrun --nproc_per_node 1 llamacpp_mock_api.py”, the following error occurs. #6

Closed HwJhx closed 2 months ago

HwJhx commented 1 year ago

torchrun --nproc_per_node 1 llamacpp_mock_api.py \ --ckpt_dir CodeLlama-7b-Instruct/ \ --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \ --max_seq_len 128 --max_batch_size 4

initializing model parallel with size 1 initializing ddp with size 1 initializing pipeline with size 1 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 16713) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/local/bin/torchrun", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 794, in main run(args) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

llamacpp_mock_api.py FAILED

Failures:

------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-09-04_12:12:41 host : 13edd873e909 rank : 0 (local_rank: 0) exitcode : -9 (pid: 16713) error_file: traceback : Signal 9 (SIGKILL) received by PID 16713
HwJhx commented 1 year ago

My GPU Info as below:

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 32C P8 9W / 70W | 0MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

BoazimMatrix commented 1 year ago

Did you figure it out? I have the same problem

xNul commented 2 months ago

Were you able to run Code Llama successfully using the codellama repository?

It's been nearly a year since this was opened, so I'm going to close it for now, but I'll reopen it if you send another message.