When I execute “torchrun --nproc_per_node 1 llamacpp_mock_api.py”, the following error occurs.

HwJhx commented 1 year ago

torchrun --nproc_per_node 1 llamacpp_mock_api.py \ --ckpt_dir CodeLlama-7b-Instruct/ \ --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \ --max_seq_len 128 --max_batch_size 4

initializing model parallel with size 1 initializing ddp with size 1 initializing pipeline with size 1 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 16713) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/local/bin/torchrun", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 794, in main run(args) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

llamacpp_mock_api.py FAILED

Failures:
------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-09-04_12:12:41 host : 13edd873e909 rank : 0 (local_rank: 0) exitcode : -9 (pid: 16713) error_file: traceback : Signal 9 (SIGKILL) received by PID 16713

HwJhx commented 1 year ago

My GPU Info as below:

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

BoazimMatrix commented 1 year ago

Did you figure it out? I have the same problem

xNul commented 2 months ago

Were you able to run Code Llama successfully using the codellama repository?

It's been nearly a year since this was opened, so I'm going to close it for now, but I'll reopen it if you send another message.

xNul / code-llama-for-vscode

When I execute “torchrun --nproc_per_node 1 llamacpp_mock_api.py”, the following error occurs. #6

llamacpp_mock_api.py FAILED