How to specify which GPU to use? CUDA_VISIBLE_DEVICES NOT working.

longzhenren commented 1 year ago

Describe the bug

I ran this on a server with 4x RTX3090,GPU0 is busy with other tasks, I want to use GPU1 or other free GPUs. I set CUDA_VISIBLE_DEVICES env, but it doesn't work. How to specify which GPU to run on?

Is there an existing issue for this?

[x] I have searched the existing issues

Reproduction

Always, when GPU0 is busy.

Screenshot

No response

Logs

INFO:Loading llama-13b...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████| 3/3 [00:24<00:00,  8.25s/it]
Traceback (most recent call last):
  File "/home/amur/text-generation-webui/server.py", line 1103, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/amur/text-generation-webui/modules/models.py", line 97, in load_model
    output = load_func(model_name)
  File "/home/amur/text-generation-webui/modules/models.py", line 160, in huggingface_loader
    model = model.cuda()
  File "/home/amur/anaconda3/envs/webui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 905, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/amur/anaconda3/envs/webui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/amur/anaconda3/envs/webui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/amur/anaconda3/envs/webui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "/home/amur/anaconda3/envs/webui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/amur/anaconda3/envs/webui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 905, in <lambda>
    return self._apply(lambda t: t.cuda(device))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 23.70 GiB total capacity; 8.05 GiB already allocated; 27.56 MiB free; 8.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

System Info

llama-13b
Anaconda3 2023.03
Ubuntu22.04
Nvidia RTX3090, with CUDA 11.4 and Driver 470.74

Ph0rk0z commented 1 year ago

CUDA_VISIBLE_DEVICES=1 python server.py works fine for me

darkacorn commented 1 year ago

you can also use --device-id in the startup params

longzhenren commented 1 year ago

I am sorry but I think this parameter option is not available in text-generation-webui, but I know that is a valid parameter in stable-difussion-webui.

(webui) amur@bsc-7:~/text-generation-webui$ python server.py --model llama-13b --chat --device-id=3 usage: server.py [-h] [--notebook] [--chat] [--character CHARACTER] [--model MODEL] [--lora LORA [LORA ...]] [--model-dir MODEL_DIR] [--lora-dir LORA_DIR] [--model-menu] [--no-stream] [--settings SETTINGS] [--extensions EXTENSIONS [EXTENSIONS ...]] [--verbose] [--cpu] [--auto-devices] [--gpu-memory GPU_MEMORY [GPU_MEMORY ...]] [--cpu-memory CPU_MEMORY] [--disk] [--disk-cache-dir DISK_CACHE_DIR] [--load-in-8bit] [--bf16] [--no-cache] [--xformers] [--sdp-attention] [--trust-remote-code] [--load-in-4bit] [--compute_dtype COMPUTE_DTYPE] [--quant_type QUANT_TYPE] [--use_double_quant] [--threads THREADS] [--n_batch N_BATCH] [--no-mmap] [--mlock] [--cache-capacity CACHE_CAPACITY] [--n-gpu-layers N_GPU_LAYERS] [--n_ctx N_CTX] [--llama_cpp_seed LLAMA_CPP_SEED] [--wbits WBITS] [--model_type MODEL_TYPE] [--groupsize GROUPSIZE] [--pre_layer PRE_LAYER [PRE_LAYER ...]] [--checkpoint CHECKPOINT] [--monkey-patch] [--quant_attn] [--warmup_autotune] [--fused_mlp] [--gptq-for-llama] [--autogptq] [--triton] [--desc_act] [--flexgen] [--percent PERCENT [PERCENT ...]] [--compress-weight] [--pin-weight [PIN_WEIGHT]] [--deepspeed] [--nvme-offload-dir NVME_OFFLOAD_DIR] [--local_rank LOCAL_RANK] [--rwkv-strategy RWKV_STRATEGY] [--rwkv-cuda-on] [--listen] [--listen-host LISTEN_HOST] [--listen-port LISTEN_PORT] [--share] [--auto-launch] [--gradio-auth GRADIO_AUTH] [--gradio-auth-path GRADIO_AUTH_PATH] [--api] [--api-blocking-port API_BLOCKING_PORT] [--api-streaming-port API_STREAMING_PORT] [--public-api] [--multimodal-pipeline MULTIMODAL_PIPELINE] server.py: error: unrecognized arguments: --device-id=3 (webui) amur@bsc-7:~/text-generation-webui$ python server.py --model llama-13b --chat --device-id 3 usage: server.py [-h] [--notebook] [--chat] [--character CHARACTER] [--model MODEL] [--lora LORA [LORA ...]] [--model-dir MODEL_DIR] [--lora-dir LORA_DIR] [--model-menu] [--no-stream] [--settings SETTINGS] [--extensions EXTENSIONS [EXTENSIONS ...]] [--verbose] [--cpu] [--auto-devices] [--gpu-memory GPU_MEMORY [GPU_MEMORY ...]] [--cpu-memory CPU_MEMORY] [--disk] [--disk-cache-dir DISK_CACHE_DIR] [--load-in-8bit] [--bf16] [--no-cache] [--xformers] [--sdp-attention] [--trust-remote-code] [--load-in-4bit] [--compute_dtype COMPUTE_DTYPE] [--quant_type QUANT_TYPE] [--use_double_quant] [--threads THREADS] [--n_batch N_BATCH] [--no-mmap] [--mlock] [--cache-capacity CACHE_CAPACITY] [--n-gpu-layers N_GPU_LAYERS] [--n_ctx N_CTX] [--llama_cpp_seed LLAMA_CPP_SEED] [--wbits WBITS] [--model_type MODEL_TYPE] [--groupsize GROUPSIZE] [--pre_layer PRE_LAYER [PRE_LAYER ...]] [--checkpoint CHECKPOINT] [--monkey-patch] [--quant_attn] [--warmup_autotune] [--fused_mlp] [--gptq-for-llama] [--autogptq] [--triton] [--desc_act] [--flexgen] [--percent PERCENT [PERCENT ...]] [--compress-weight] [--pin-weight [PIN_WEIGHT]] [--deepspeed] [--nvme-offload-dir NVME_OFFLOAD_DIR] [--local_rank LOCAL_RANK] [--rwkv-strategy RWKV_STRATEGY] [--rwkv-cuda-on] [--listen] [--listen-host LISTEN_HOST] [--listen-port LISTEN_PORT] [--share] [--auto-launch] [--gradio-auth GRADIO_AUTH] [--gradio-auth-path GRADIO_AUTH_PATH] [--api] [--api-blocking-port API_BLOCKING_PORT] [--api-streaming-port API_STREAMING_PORT] [--public-api] [--multimodal-pipeline MULTIMODAL_PIPELINE] server.py: error: unrecognized arguments: --device-id 3

darkacorn commented 1 year ago

ok, my bad sorry about that let me get to my computer and try .. i got 2 cards so i should be able to see the difference - ill report back in a few min

darkacorn commented 1 year ago

device 0

https://prnt.sc/IodeEPjE8sv0

device 1

https://prnt.sc/NEbE7ZNO5_n9

as you can see it selects the correct gpu

--device-id does not work as you pointed out .. that was my mistake but the solution Ph0rk0z gave does work

longzhenren commented 1 year ago

device 0

https://prnt.sc/IodeEPjE8sv0

device 1

https://prnt.sc/NEbE7ZNO5_n9

as you can see it selects the correct gpu

--device-id does not work as you pointed out .. that was my mistake but the solution Ph0rk0z gave does work

Hmmm, That is so strange. Did you use python installed in a conda environment?

darkacorn commented 1 year ago

i basically run it in the conda env from the 1 click installer

longzhenren commented 1 year ago

Okay, I will try that, However I am using llama-13b model instead of the default GPT-Q, so I have to modify the the shell script XD Thanks for your help!

Ph0rk0z commented 1 year ago

I have conda installed env but this works for all envs on my system. For instance when running audio generators, SD, etc.

maxchiron commented 1 year ago

Set visible gpu id in server.py works for me. add below code to server.py

os.environ['CUDA_VISIBLE_DEVICES'] = '1, 2, 3'

longzhenren commented 1 year ago

Set visible gpu id in server.py works for me. add below code to server.py

os.environ['CUDA_VISIBLE_DEVICES'] = 'id'

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

maxchiron commented 1 year ago

Where did u add the line? It should be add after the os imported and before all other codes.> Sorry, but this doesn't work for me.

---- Replied Message ---- | From | @.> | | Date | 06/09/2023 12:31 | | To | oobabooga/text-generation-webui @.> | | Cc | LongZhenren @.>, Author @.> | | Subject | Re: [oobabooga/text-generation-webui] How to specify which GPU to use? CUDA_VISIBLE_DEVICES NOT working. (Issue #2559) |

Set visible gpu id in server.py works for me. add below code to server.py

os.environ['CUDA_VISIBLE_DEVICES'] = 'id'

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

longzhenren commented 1 year ago

Where did u add the line? It should be add after the os imported and before all other codes.> Sorry, but this doesn't work for me.

---- Replied Message ---- | From | @.**> | | Date | 06/09/2023 12:31 | | To | oobabooga/text-generation-webui @.**> | | Cc | LongZhenren @.**>, Author @.**> | | Subject | Re: [oobabooga/text-generation-webui] How to specify which GPU to use? CUDA_VISIBLE_DEVICES NOT working. (Issue #2559) | Set visible gpu id in server.py works for me. add below code to server.py os.environ['CUDA_VISIBLEDEVICES'] = 'id' — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @_.***>

Sorry to bother, but there were some problem with my anaconda environment. I fixed that and it finally works. For detail, I installed anaconda into /raid/another_username/data/anaconda3/environments, while some other user on the server changed the permission of this dir. So I was using the system default python environment, of which pytorch has no CUDA support. I checked and found out this problem.

Verinoth commented 8 months ago

To aid others who arrive here via search engine (this is one of the top results when searching how to specify a particular GPU), server.py appears deprecated, and the line must be added in one_click.py now, and the setting will take effect on next launch.

os.environ['CUDA_VISIBLE_DEVICES'] = '1, 2, 3'

where 1, 2, 3, is/are GPU ID that you would like exposed to the software.

rabbitbytes commented 4 weeks ago

As google got me here, I thought I would post what I found to fix this issue on Linux. edit server.py. under import os .. add a new line os.environ['CUDA_VISIBLE_DEVICES'] = '1' for your second gpu, or '2' ect. This works even you use the start_linux.sh or create your own /venv and use venv/bin/python ./server.py. I have not tested this on windows. I post this here for noobs that just need to know were it goes. I hope this gets the process is rapper script vs run scripts into noobs minds like mine. "a kick in the butt can be a great teacher"

oobabooga / text-generation-webui