Issue on page /index.html /How to make MIG GIs visible in a Ray session, for allocating a different worker in a different MIG GI.

ChristosPeridis commented 1 year ago

Dear members of the Ray team,

I am working with DRL algorithms using rllib. I am configuring and testing multiple experiments using the Tune API (tune.run()) as well as the different implemented DRL algorithms that the rllib API offers. I am running my code in a server machine equipped with two Nvidia RTX A100 GPUs. In this server I have configured the two A100s with MIG configuration of "MIG 1g.5gb". This splits each A100 in 7 GIs (GPU Instances). Each GI has a unique UUID. I want to run the DDPPO algorithm and each worker to use one of the 14 in total available MIG GIs. How can I do this?

I have tried to update the environ dictionary and add a key "CUDA_VISIBLE_DEVICES" with a list of all available MIG GIs IDs that I want to use before initializing a Ray session. However it did not work. Then I tried instead passing the IDs as numbers, "0, 1, 2, ..." but that did not work either.

Could you please provide me with some advice on how I should set up my system in order to be able to leverage the different GIs?

I am always at your disposal for any further queries regarding my use case and set up.

Thank you very much for your valuable help!

Kind regards,

Christos Peridis

stale[bot] commented 1 year ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] commented 1 year ago

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

jjyao commented 1 year ago

@ChristosPeridis sorry for missing this one.

Have you tried CUDA_VISIBLE_DEVICES=uuid1,uuid2....,uuid14 ray start --num-gpus=14 This will start a Ray node with 14 GPUs (each is one 1g.5gb).

joe-schwartz-certara commented 6 months ago

I have a similar issue. I'm trying to allocate VLLM with tensor_parallelism=2 onto two MIG partitions. I'm exposing them via CUDA_VISIBLE_DEVICES=uuid,etc as @jjyao suggests but I get.

`

echo 'Starting vllm api server...'
echo ' Model: mistralai/Mixtral-8x7B-Instruct-v0.1' Disabling request logging
CMD='python -u -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 9000 --model mistralai/Mixtral-8x7B-Instruct-v0.1 --tensor-parallel-size 2 --download-dir /data'
[[ -z '' ]]
echo 'Disabling request logging'
CMD='python -u -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 9000 --model mistralai/Mixtral-8x7B-Instruct-v0.1 --tensor-parallel-size 2 --download-dir /data --disable-log-requests'
[[ -n --max-model-len 16384 --dtype half --gpu-memory-utilization 0.87 --trust-remote-code ]]
CMD='python -u -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 9000 --model mistralai/Mixtral-8x7B-Instruct-v0.1 --tensor-parallel-size 2 --download-dir /data --disable-log-requests --max-model-len 16384 --dtype half --gpu-memory-utilization 0.87 --trust-remote-code'
exec python -u -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 9000 --model mistralai/Mixtral-8x7B-Instruct-v0.1 --tensor-parallel-size 2 --download-dir /data --disable-log-requests --max-model-len 16384 --dtype half --gpu-memory-utilization 0.87 --trust-remote-code /miniforge3/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead. warnings.warn( /miniforge3/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( WARNING 05-21 18:38:28 config.py:1086] Casting torch.bfloat16 to torch.float16. 2024-05-21 18:38:31,118 INFO worker.py:1749 -- Started a local Ray instance. INFO 05-21 18:38:31 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='mistralai/Mixtral-8x7B-Instruct-v0.1', speculative_config=None, tokenizer='mistralai/Mixtral-8x7B-Instruct-v0.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=16384, download_dir='/data', load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=mistralai/Mixtral-8x7B-Instruct-v0.1) (pid=488) /miniforge3/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead. (pid=488) warnings.warn( INFO 05-21 18:38:35 utils.py:660] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 (RayWorkerWrapper pid=528) INFO 05-21 18:38:35 utils.py:660] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 INFO 05-21 18:38:36 selector.py:81] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Please install it for better performance. INFO 05-21 18:38:36 selector.py:32] Using XFormers backend. (RayWorkerWrapper pid=528) INFO 05-21 18:38:36 selector.py:81] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Please install it for better performance. (RayWorkerWrapper pid=528) INFO 05-21 18:38:36 selector.py:32] Using XFormers backend. (RayWorkerWrapper pid=528) ERROR 05-21 18:38:37 worker_base.py:145] Error executing method init_device. This might cause deadlock in distributed execution. (RayWorkerWrapper pid=528) ERROR 05-21 18:38:37 worker_base.py:145] Traceback (most recent call last): (RayWorkerWrapper pid=528) ERROR 05-21 18:38:37 worker_base.py:145] File "/miniforge3/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method (RayWorkerWrapper pid=528) ERROR 05-21 18:38:37 worker_base.py:145] return executor(*args, **kwargs) (RayWorkerWrapper pid=528) ERROR 05-21 18:38:37 worker_base.py:145] File "/miniforge3/lib/python3.10/site-packages/vllm/worker/worker.py", line 102, in init_device (RayWorkerWrapper pid=528) ERROR 05-21 18:38:37 worker_base.py:145] torch.cuda.set_device(self.device) (RayWorkerWrapper pid=528) ERROR 05-21 18:38:37 worker_base.py:145] File "/miniforge3/lib/python3.10/site-packages/torch/cuda/init.py", line 399, in set_device (RayWorkerWrapper pid=528) ERROR 05-21 18:38:37 worker_base.py:145] torch._C._cuda_setDevice(device) (RayWorkerWrapper pid=528) ERROR 05-21 18:38:37 worker_base.py:145] RuntimeError: CUDA error: invalid device ordinal (RayWorkerWrapper pid=528) ERROR 05-21 18:38:37 worker_base.py:145] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. (RayWorkerWrapper pid=528) ERROR 05-21 18:38:37 worker_base.py:145] `

My UUID's for MIG devices begin with MIG- instead of GPU- so maybe this is a clue to why it isn't working. I have been running this model and other models with tensor_parallelism=2 fine and models with tensor_parallelism=1 on mig devices fine as well. The combination of both pieces seems to be an issue for ray

ray-project / ray

Issue on page /index.html /How to make MIG GIs visible in a Ray session, for allocating a different worker in a different MIG GI. #32778