Closed rakesgi2022 closed 7 months ago
My docker-compose :
version: '3' services: vllm-openai:
image: vllm/vllm-openai:latest
environment:
- HUGGING_FACE_HUB_TOKEN=
ports:
- "8000:8000"
ipc: host
volumes:
- /var/lib/vllm/cache/huggingface:/root/.cache/huggingface
command: ["--chat-template", "vllm.entrypoints.openai.api_server", "--model", "mistralai/Mixtral-8x7B-Instruct-v0.1", "--dtype", "half", "--gpu-memory-utilization", "1", "--load-format", "safetensors", "--tensor-parallel-size", "2", "--worker-use-ray"]
# environment:
#- NVIDIA_VISIBLE_DEVICES=all
#- NVIDIA_DRIVER_CAPABILITIES=compute,utility
runtime: nvidia
deploy:
resources:
#limits:
#memory: 15g
#reservations:
# memory: 2g
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
I guess the download failed. You could try removing the downloaded files and retry.
I guess the download failed. You could try removing the downloaded files and retry.
Probably after restarting, it works but this problem is not very clear, it looks like network congestion.
Hello,
I get this error when I try to load the mistralai/Mixtral-8X7B-Instruct-v0.1 model into the latest container with 2 A100s... Is it related to the hugginface key? I've run out of ideas!
/(RayWorkerVllm pid=1334) tensors: 98%|█████████▊| 4.89G/4.98G [04:23<00:09, 9.86MB/s] (RayWorkerVllm pid=1334) (RayWorkerVllm pid=1334) tensors: 98%|█████████▊| 4.90G/4.98G [04:24<00:07, 12.2MB/s] (RayWorkerVllm pid=1334) tensors: 98%|█████████▊| 4.91G/4.98G [04:24<00:04, 15.1MB/s] (RayWorkerVllm pid=1334) tensors: 99%|█████████▊| 4.92G/4.98G [04:24<00:03, 18.7MB/s] model-00009-of-00019.safetensors: 100%|██████████| 4.98G/4.98G [04:28<00:00, 18.6MB/s] Traceback (most recent call last): 99%|█████████▉| 4.94G/4.98G [04:25<00:01, 26.4MB/s] File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main0:01, 30.0MB/s] return _run_code(code, main_globals, None,██▉| 4.97G/4.98G [04:27<00:00, 37.0MB/s] File "/usr/lib/python3.10/runpy.py", line 86, in _run_code8G [04:27<00:00, 12.9MB/s] exec(code, run_globals) File "/workspace/vllm/entrypoints/openai/api_server.py", line 729, in
engine = AsyncLLMEngine.from_engine_args(engine_args)
File "/workspace/vllm/engine/async_llm_engine.py", line 496, in from_engine_args
engine = cls(parallel_config.worker_use_ray,
File "/workspace/vllm/engine/async_llm_engine.py", line 269, in init
self.engine = self._init_engine(*args, kwargs)
File "/workspace/vllm/engine/async_llm_engine.py", line 314, in _init_engine
return engine_class(*args, *kwargs)
File "/workspace/vllm/engine/llm_engine.py", line 108, in init
self._init_workers_ray(placement_group)
File "/workspace/vllm/engine/llm_engine.py", line 195, in _init_workers_ray
self._run_workers(
File "/workspace/vllm/engine/llm_engine.py", line 755, in _run_workers
self._run_workers_in_batch(workers, method, args, kwargs))
File "/workspace/vllm/engine/llm_engine.py", line 732, in _run_workers_in_batch
all_outputs = ray.get(all_outputs)
File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return fn(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2563, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ChunkedEncodingError): ray::RayWorkerVllm.execute_method() (pid=1334, actor_id=0eaf3e9a36f1ca115ba4106101000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7f477f445840>)
File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 833, in _raw_read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(2720359157 bytes read, 2172450427 more expected)
The above exception was the direct cause of the following exception:
ray::RayWorkerVllm.execute_method() (pid=1334, actor_id=0eaf3e9a36f1ca115ba4106101000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7f477f445840>) File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 816, in generate yield from self.raw.stream(chunk_size, decode_content=True) File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 934, in stream data = self.read(amt=amt, decode_content=decode_content) File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 905, in read data = self._raw_read(amt) File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 811, in _raw_read with self._error_catcher(): File "/usr/lib/python3.10/contextlib.py", line 153, in exit self.gen.throw(typ, value, traceback) File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 729, in _error_catcher raise ProtocolError(f"Connection broken: {e!r}", e) from e urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(2720359157 bytes read, 2172450427 more expected)', IncompleteRead(2720359157 bytes read, 2172450427 more expected))
During handling of the above exception, another exception occurred:
ray::RayWorkerVllm.execute_method() (pid=1334, actor_id=0eaf3e9a36f1ca115ba4106101000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7f477f445840>) File "/workspace/vllm/engine/ray_utils.py", line 31, in execute_method return executor(*args, kwargs) File "/workspace/vllm/worker/worker.py", line 79, in load_model self.model_runner.load_model() File "/workspace/vllm/worker/model_runner.py", line 57, in load_model self.model = get_model(self.model_config) File "/workspace/vllm/model_executor/model_loader.py", line 72, in get_model model.load_weights(model_config.model, model_config.download_dir, File "/workspace/vllm/model_executor/models/mixtral.py", line 407, in load_weights for name, loaded_weight in hf_model_weights_iterator( File "/workspace/vllm/model_executor/weight_utils.py", line 198, in hf_model_weights_iterator hf_folder, hf_weights_files, use_safetensors = prepare_hf_model_weights( File "/workspace/vllm/model_executor/weight_utils.py", line 155, in prepare_hf_model_weights hf_folder = snapshot_download(model_name_or_path, File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/_snapshot_download.py", line 238, in snapshot_download thread_map( File "/usr/local/lib/python3.10/dist-packages/tqdm/contrib/concurrent.py", line 69, in thread_map return _executor_map(ThreadPoolExecutor, fn, iterables, tqdm_kwargs) File "/usr/local/lib/python3.10/dist-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), kwargs)) File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1170, in iter for obj in iterable: File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator yield _result_or_cancel(fs.pop()) File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel return fut.result(timeout) File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result return self.get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result raise self._exception File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, *self.kwargs) File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/_snapshot_download.py", line 213, in _inner_hf_hub_download return hf_hub_download( File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1461, in hf_hub_download http_get( File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 541, in http_get for chunk in r.iter_content(chunk_size=DOWNLOAD_CHUNK_SIZE): File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 818, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(2720359157 bytes read, 2172450427 more expected)', IncompleteRead(2720359157 bytes read, 2172450427 more expected))
Thanks for your help !