skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.79k stars 509 forks source link

[Example] Mixtral example fail to work on Azure VM due to NCCL error #2905

Closed Michaelvll closed 9 months ago

Michaelvll commented 10 months ago

sky launch --disk-tier none -c test-mixtral --cloud azure llm/mixtral/serve.yaml

I 12-27 11:40:38 log_lib.py:431] Start streaming logs for job 1.
INFO: Tip: use Ctrl-C to exit log streaming (task will not be killed).
INFO: Waiting for task resources on 1 node. This will block if the cluster is full.
INFO: All task resources reserved.
INFO: Reserved IPs: ['10.130.0.4']
(task, pid=9688) INFO 12-27 11:40:41 api_server.py:727] args: Namespace(host='0.0.0.0', port=8000, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], served_model_name=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, model='mistralai/Mixtral-8x7B-Instruct-v0.1', tokenizer=None, revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', max_model_len=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, block_size=16, seed=0, swap_space=4, gpu_memory_utilization=0.9, max_num_batched_tokens=None, max_num_seqs=256, max_paddings=256, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
config.json: 100%|██████████| 720/720 [00:00<00:00, 4.91MB/s]
(task, pid=9688) 2023-12-27 11:40:43,005        INFO worker.py:1724 -- Started a local Ray instance.
(task, pid=9688) INFO 12-27 11:40:43 llm_engine.py:73] Initializing an LLM engine with config: model='mistralai/Mixtral-8x7B-Instruct-v0.1', tokenizer='mistralai/Mixtral-8x7B-Instruct-v0.1', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, quantization=None, enforce_eager=False, seed=0)
tokenizer_config.json: 100%|██████████| 1.46k/1.46k [00:00<00:00, 13.0MB/s]
tokenizer.model: 100%|██████████| 493k/493k [00:00<00:00, 96.4MB/s]
tokenizer.json: 100%|██████████| 1.80M/1.80M [00:00<00:00, 104MB/s]
special_tokens_map.json: 100%|██████████| 72.0/72.0 [00:00<00:00, 805kB/s]
(task, pid=9688) Traceback (most recent call last):
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/runpy.py", line 196, in _run_module_as_main
(task, pid=9688)     return _run_code(code, main_globals, None,
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/runpy.py", line 86, in _run_code
(task, pid=9688)     exec(code, run_globals)
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 737, in <module>
(task, pid=9688)     engine = AsyncLLMEngine.from_engine_args(engine_args)
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 494, in from_engine_args
(task, pid=9688)     engine = cls(parallel_config.worker_use_ray,
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 267, in __init__
(task, pid=9688)     self.engine = self._init_engine(*args, **kwargs)
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 312, in _init_engine
(task, pid=9688)     return engine_class(*args, **kwargs)
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 112, in __init__
(task, pid=9688)     self._init_workers_ray(placement_group)
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 195, in _init_workers_ray
(task, pid=9688)     self._run_workers(
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 755, in _run_workers
(task, pid=9688)     self._run_workers_in_batch(workers, method, *args, **kwargs))
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 732, in _run_workers_in_batch
(task, pid=9688)     all_outputs = ray.get(all_outputs)
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
(task, pid=9688)     return fn(*args, **kwargs)
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(task, pid=9688)     return func(*args, **kwargs)
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/ray/_private/worker.py", line 2624, in get
(task, pid=9688)     raise value.as_instanceof_cause()
(task, pid=9688) ray.exceptions.RayTaskError(DistBackendError): ray::RayWorkerVllm.execute_method() (pid=12792, ip=10.130.0.4, actor_id=9241629b5329c36992ed7e6301000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x15334ad95f90>)
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 31, in execute_method
(task, pid=9688)     return executor(*args, **kwargs)
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/vllm/worker/worker.py", line 72, in init_model
(task, pid=9688)     _init_distributed_environment(self.parallel_config, self.rank,
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/vllm/worker/worker.py", line 190, in _init_distributed_environment
(task, pid=9688)     torch.distributed.all_reduce(torch.zeros(1).cuda())
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 47, in wrapper
(task, pid=9688)     return func(*args, **kwargs)
(task, pid=9688)   File "/home/azureuser/miniconda3/envs/mixtral/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2050, in all_reduce
(task, pid=9688)     work = group.allreduce([tensor], opts)
(task, pid=9688) torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, unhandled system error (run with NCCL_DEBUG=INFO for details), NCCL version 2.18.1
(task, pid=9688) ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. 
(task, pid=9688) Last error:
(task, pid=9688) XML Import Channel : dev 2 not found.
INFO: Job finished (status: SUCCEEDED).
yangyingxiang commented 10 months ago

@Michaelvll have you resolved it?

Michaelvll commented 10 months ago

Thanks for pinning this @yangyingxiang! Currently we did not figure out the reason for this yet. It seems to be an issue with Azure's GPU setup, and we found people also experiencing this issue: https://github.com/NVIDIA/nccl/issues/1101.

We will keep an eye on this issue, and see if there is some possible reason. Please let us know if you are able to find some solution. : )