oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.94k stars 5.34k forks source link

How to specify which GPU to use? CUDA_VISIBLE_DEVICES NOT working. #2559

Closed longzhenren closed 1 year ago

longzhenren commented 1 year ago

Describe the bug

I ran this on a server with 4x RTX3090,GPU0 is busy with other tasks, I want to use GPU1 or other free GPUs. I set CUDA_VISIBLE_DEVICES env, but it doesn't work. How to specify which GPU to run on?

Is there an existing issue for this?

Reproduction

Always, when GPU0 is busy.

Screenshot

No response

Logs

INFO:Loading llama-13b...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████| 3/3 [00:24<00:00,  8.25s/it]
Traceback (most recent call last):
  File "/home/amur/text-generation-webui/server.py", line 1103, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/amur/text-generation-webui/modules/models.py", line 97, in load_model
    output = load_func(model_name)
  File "/home/amur/text-generation-webui/modules/models.py", line 160, in huggingface_loader
    model = model.cuda()
  File "/home/amur/anaconda3/envs/webui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 905, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/amur/anaconda3/envs/webui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/amur/anaconda3/envs/webui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/amur/anaconda3/envs/webui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "/home/amur/anaconda3/envs/webui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/amur/anaconda3/envs/webui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 905, in <lambda>
    return self._apply(lambda t: t.cuda(device))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 23.70 GiB total capacity; 8.05 GiB already allocated; 27.56 MiB free; 8.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

System Info

llama-13b
Anaconda3 2023.03
Ubuntu22.04
Nvidia RTX3090, with CUDA 11.4 and Driver 470.74
Ph0rk0z commented 1 year ago

CUDA_VISIBLE_DEVICES=1 python server.py works fine for me

darkacorn commented 1 year ago

you can also use --device-id in the startup params

longzhenren commented 1 year ago

I am sorry but I think this parameter option is not available in text-generation-webui, but I know that is a valid parameter in stable-difussion-webui.

(webui) amur@bsc-7:~/text-generation-webui$ python server.py --model llama-13b --chat --device-id=3 usage: server.py [-h] [--notebook] [--chat] [--character CHARACTER] [--model MODEL] [--lora LORA [LORA ...]] [--model-dir MODEL_DIR] [--lora-dir LORA_DIR] [--model-menu] [--no-stream] [--settings SETTINGS] [--extensions EXTENSIONS [EXTENSIONS ...]] [--verbose] [--cpu] [--auto-devices] [--gpu-memory GPU_MEMORY [GPU_MEMORY ...]] [--cpu-memory CPU_MEMORY] [--disk] [--disk-cache-dir DISK_CACHE_DIR] [--load-in-8bit] [--bf16] [--no-cache] [--xformers] [--sdp-attention] [--trust-remote-code] [--load-in-4bit] [--compute_dtype COMPUTE_DTYPE] [--quant_type QUANT_TYPE] [--use_double_quant] [--threads THREADS] [--n_batch N_BATCH] [--no-mmap] [--mlock] [--cache-capacity CACHE_CAPACITY] [--n-gpu-layers N_GPU_LAYERS] [--n_ctx N_CTX] [--llama_cpp_seed LLAMA_CPP_SEED] [--wbits WBITS] [--model_type MODEL_TYPE] [--groupsize GROUPSIZE] [--pre_layer PRE_LAYER [PRE_LAYER ...]] [--checkpoint CHECKPOINT] [--monkey-patch] [--quant_attn] [--warmup_autotune] [--fused_mlp] [--gptq-for-llama] [--autogptq] [--triton] [--desc_act] [--flexgen] [--percent PERCENT [PERCENT ...]] [--compress-weight] [--pin-weight [PIN_WEIGHT]] [--deepspeed] [--nvme-offload-dir NVME_OFFLOAD_DIR] [--local_rank LOCAL_RANK] [--rwkv-strategy RWKV_STRATEGY] [--rwkv-cuda-on] [--listen] [--listen-host LISTEN_HOST] [--listen-port LISTEN_PORT] [--share] [--auto-launch] [--gradio-auth GRADIO_AUTH] [--gradio-auth-path GRADIO_AUTH_PATH] [--api] [--api-blocking-port API_BLOCKING_PORT] [--api-streaming-port API_STREAMING_PORT] [--public-api] [--multimodal-pipeline MULTIMODAL_PIPELINE] server.py: error: unrecognized arguments: --device-id=3 (webui) amur@bsc-7:~/text-generation-webui$ python server.py --model llama-13b --chat --device-id 3 usage: server.py [-h] [--notebook] [--chat] [--character CHARACTER] [--model MODEL] [--lora LORA [LORA ...]] [--model-dir MODEL_DIR] [--lora-dir LORA_DIR] [--model-menu] [--no-stream] [--settings SETTINGS] [--extensions EXTENSIONS [EXTENSIONS ...]] [--verbose] [--cpu] [--auto-devices] [--gpu-memory GPU_MEMORY [GPU_MEMORY ...]] [--cpu-memory CPU_MEMORY] [--disk] [--disk-cache-dir DISK_CACHE_DIR] [--load-in-8bit] [--bf16] [--no-cache] [--xformers] [--sdp-attention] [--trust-remote-code] [--load-in-4bit] [--compute_dtype COMPUTE_DTYPE] [--quant_type QUANT_TYPE] [--use_double_quant] [--threads THREADS] [--n_batch N_BATCH] [--no-mmap] [--mlock] [--cache-capacity CACHE_CAPACITY] [--n-gpu-layers N_GPU_LAYERS] [--n_ctx N_CTX] [--llama_cpp_seed LLAMA_CPP_SEED] [--wbits WBITS] [--model_type MODEL_TYPE] [--groupsize GROUPSIZE] [--pre_layer PRE_LAYER [PRE_LAYER ...]] [--checkpoint CHECKPOINT] [--monkey-patch] [--quant_attn] [--warmup_autotune] [--fused_mlp] [--gptq-for-llama] [--autogptq] [--triton] [--desc_act] [--flexgen] [--percent PERCENT [PERCENT ...]] [--compress-weight] [--pin-weight [PIN_WEIGHT]] [--deepspeed] [--nvme-offload-dir NVME_OFFLOAD_DIR] [--local_rank LOCAL_RANK] [--rwkv-strategy RWKV_STRATEGY] [--rwkv-cuda-on] [--listen] [--listen-host LISTEN_HOST] [--listen-port LISTEN_PORT] [--share] [--auto-launch] [--gradio-auth GRADIO_AUTH] [--gradio-auth-path GRADIO_AUTH_PATH] [--api] [--api-blocking-port API_BLOCKING_PORT] [--api-streaming-port API_STREAMING_PORT] [--public-api] [--multimodal-pipeline MULTIMODAL_PIPELINE] server.py: error: unrecognized arguments: --device-id 3

darkacorn commented 1 year ago

ok, my bad sorry about that let me get to my computer and try .. i got 2 cards so i should be able to see the difference - ill report back in a few min

darkacorn commented 1 year ago

device 0

https://prnt.sc/IodeEPjE8sv0

device 1

https://prnt.sc/NEbE7ZNO5_n9

as you can see it selects the correct gpu

--device-id does not work as you pointed out .. that was my mistake but the solution Ph0rk0z gave does work

longzhenren commented 1 year ago

device 0

https://prnt.sc/IodeEPjE8sv0

device 1

https://prnt.sc/NEbE7ZNO5_n9

as you can see it selects the correct gpu

--device-id does not work as you pointed out .. that was my mistake but the solution Ph0rk0z gave does work

Hmmm, That is so strange. Did you use python installed in a conda environment?

darkacorn commented 1 year ago

i basically run it in the conda env from the 1 click installer

longzhenren commented 1 year ago

Okay, I will try that, However I am using llama-13b model instead of the default GPT-Q, so I have to modify the the shell script XD Thanks for your help!

Ph0rk0z commented 1 year ago

I have conda installed env but this works for all envs on my system. For instance when running audio generators, SD, etc.

maxchiron commented 1 year ago

Set visible gpu id in server.py works for me. add below code to server.py

os.environ['CUDA_VISIBLE_DEVICES'] = '1, 2, 3'

longzhenren commented 1 year ago

Sorry, but this doesn't work for me. ---- Replied Message ---- | From | @.> | | Date | 06/09/2023 12:31 | | To | oobabooga/text-generation-webui @.> | | Cc | LongZhenren @.>, Author @.> | | Subject | Re: [oobabooga/text-generation-webui] How to specify which GPU to use? CUDA_VISIBLE_DEVICES NOT working. (Issue #2559) |

Set visible gpu id in server.py works for me. add below code to server.py

os.environ['CUDA_VISIBLE_DEVICES'] = 'id'

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

maxchiron commented 1 year ago

Where did u add the line? It should be add after the os imported and before all other codes.> Sorry, but this doesn't work for me.

---- Replied Message ---- | From | @.> | | Date | 06/09/2023 12:31 | | To | oobabooga/text-generation-webui @.> | | Cc | LongZhenren @.>, Author @.> | | Subject | Re: [oobabooga/text-generation-webui] How to specify which GPU to use? CUDA_VISIBLE_DEVICES NOT working. (Issue #2559) |

Set visible gpu id in server.py works for me. add below code to server.py

os.environ['CUDA_VISIBLE_DEVICES'] = 'id'

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

longzhenren commented 1 year ago

Where did u add the line? It should be add after the os imported and before all other codes.> Sorry, but this doesn't work for me.

---- Replied Message ---- | From | @.**> | | Date | 06/09/2023 12:31 | | To | oobabooga/text-generation-webui @.**> | | Cc | LongZhenren @.**>, Author @.**> | | Subject | Re: [oobabooga/text-generation-webui] How to specify which GPU to use? CUDA_VISIBLE_DEVICES NOT working. (Issue #2559) | Set visible gpu id in server.py works for me. add below code to server.py os.environ['CUDA_VISIBLEDEVICES'] = 'id' — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @_.***>

Sorry to bother, but there were some problem with my anaconda environment. I fixed that and it finally works. For detail, I installed anaconda into /raid/another_username/data/anaconda3/environments, while some other user on the server changed the permission of this dir. So I was using the system default python environment, of which pytorch has no CUDA support. I checked and found out this problem.

Verinoth commented 8 months ago

To aid others who arrive here via search engine (this is one of the top results when searching how to specify a particular GPU), server.py appears deprecated, and the line must be added in one_click.py now, and the setting will take effect on next launch.

os.environ['CUDA_VISIBLE_DEVICES'] = '1, 2, 3'

where 1, 2, 3, is/are GPU ID that you would like exposed to the software.

rabbitbytes commented 4 weeks ago

As google got me here, I thought I would post what I found to fix this issue on Linux. edit server.py. under import os .. add a new line os.environ['CUDA_VISIBLE_DEVICES'] = '1' for your second gpu, or '2' ect. This works even you use the start_linux.sh or create your own /venv and use venv/bin/python ./server.py. I have not tested this on windows. I post this here for noobs that just need to know were it goes. I hope this gets the process is rapper script vs run scripts into noobs minds like mine. "a kick in the butt can be a great teacher"