openppl-public / ppl.llm.serving

Apache License 2.0
122 stars 13 forks source link

nccl error #17

Closed sleepwalker2017 closed 9 months ago

sleepwalker2017 commented 10 months ago

On A30, It works fine.

On V100, run llama-13B using two GPUs.

NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.7

[ERROR][2023-09-12 05:21:16.518][nccl_utils.h:110] NCCL error(code:1) on ncclGroupEnd
[ERROR][2023-09-12 05:21:16.518][nccl_utils.h:110] NCCL error(code:1) on ncclGroupEnd
[ERROR][2023-09-12 05:21:16.518][kernel.cc:176] [ERROR][2023-09-12 05:21:16.518][kernel.cc:176] DoExecute kernel [/tok_embeddings/ParallelEmbedding] failed: other error
DoExecute kernel [/tok_embeddings/ParallelEmbedding] failed: other error
[ERROR][2023-09-12 05:21:16.518][sequential_scheduler.cc:130] [ERROR][2023-09-12 05:21:16.518][sequential_scheduler.cc:130] exec kernel[/tok_embeddings/ParallelEmbedding] of type[pmx:ParallelEmbedding:1] failed:
other error
exec kernel[/tok_embeddings/ParallelEmbedding] of type[pmx:ParallelEmbedding:1] failed: other error
[ERROR][2023-09-12 05:21:16.518][runtime_impl.cc:333] Run() failed: other error
[ERROR][2023-09-12 05:21:16.518][runtime_impl.cc:333] Run() failed: other error
[ERROR][2023-09-12 05:21:16.519][llama_worker.cc:922] ParallelExecute(RunModelTask) failed.
[INFO][2023-09-12 05:21:16.519][llama_worker.cc:1043] waiting for request ...
[ERROR][2023-09-12 05:21:16.520][nccl_utils.h:110] NCCL error(code:1) on ncclGroupEnd
[ERROR][2023-09-12 05:21:16.520][kernel.cc:176] DoExecute kernel [/tok_embeddings/ParallelEmbedding] failed: other error
[ERROR][2023-09-12 05:21:16.520][sequential_scheduler.cc:130] exec kernel[/tok_embeddings/ParallelEmbedding] of type[pmx:ParallelEmbedding:1] failed: other error
[ERROR][2023-09-12 05:21:16.520][runtime_impl.cc:333] Run() failed: other error
[ERROR][2023-09-12 05:21:16.520][nccl_utils.h:110] NCCL error(code:1) on ncclGroupEnd

Here is my config.json:

{
    "model_dir":  "/data/codes/ppl/llama-13b",
    "model_param_path": "/data/codes/ppl/llama-13b/params.json",

    "tokenizer_path": "/data/LLaMA-7B/tokenizer.model",

    "tensor_parallel_size": 2,

    "top_p": 0.0,
    "top_k": 1,

    "max_tokens_scale": 0.94,
    "max_tokens_per_request": 4096,
    "max_running_batch": 1024,

    "host": "0.0.0.0",
    "port": 10086
}
Vincent-syr commented 9 months ago

Please try again with latest version, and please issue with our template.

Hijdk commented 8 months ago

@Vincent-syr hello, 请问这个是怎么解决的,我最近用最新的代码也遇见这个问题了