Closed longzhenren closed 1 year ago
CUDA_VISIBLE_DEVICES=1 python server.py works fine for me
you can also use --device-id in the startup params
I am sorry but I think this parameter option is not available in text-generation-webui, but I know that is a valid parameter in stable-difussion-webui.
(webui) amur@bsc-7:~/text-generation-webui$ python server.py --model llama-13b --chat --device-id=3 usage: server.py [-h] [--notebook] [--chat] [--character CHARACTER] [--model MODEL] [--lora LORA [LORA ...]] [--model-dir MODEL_DIR] [--lora-dir LORA_DIR] [--model-menu] [--no-stream] [--settings SETTINGS] [--extensions EXTENSIONS [EXTENSIONS ...]] [--verbose] [--cpu] [--auto-devices] [--gpu-memory GPU_MEMORY [GPU_MEMORY ...]] [--cpu-memory CPU_MEMORY] [--disk] [--disk-cache-dir DISK_CACHE_DIR] [--load-in-8bit] [--bf16] [--no-cache] [--xformers] [--sdp-attention] [--trust-remote-code] [--load-in-4bit] [--compute_dtype COMPUTE_DTYPE] [--quant_type QUANT_TYPE] [--use_double_quant] [--threads THREADS] [--n_batch N_BATCH] [--no-mmap] [--mlock] [--cache-capacity CACHE_CAPACITY] [--n-gpu-layers N_GPU_LAYERS] [--n_ctx N_CTX] [--llama_cpp_seed LLAMA_CPP_SEED] [--wbits WBITS] [--model_type MODEL_TYPE] [--groupsize GROUPSIZE] [--pre_layer PRE_LAYER [PRE_LAYER ...]] [--checkpoint CHECKPOINT] [--monkey-patch] [--quant_attn] [--warmup_autotune] [--fused_mlp] [--gptq-for-llama] [--autogptq] [--triton] [--desc_act] [--flexgen] [--percent PERCENT [PERCENT ...]] [--compress-weight] [--pin-weight [PIN_WEIGHT]] [--deepspeed] [--nvme-offload-dir NVME_OFFLOAD_DIR] [--local_rank LOCAL_RANK] [--rwkv-strategy RWKV_STRATEGY] [--rwkv-cuda-on] [--listen] [--listen-host LISTEN_HOST] [--listen-port LISTEN_PORT] [--share] [--auto-launch] [--gradio-auth GRADIO_AUTH] [--gradio-auth-path GRADIO_AUTH_PATH] [--api] [--api-blocking-port API_BLOCKING_PORT] [--api-streaming-port API_STREAMING_PORT] [--public-api] [--multimodal-pipeline MULTIMODAL_PIPELINE] server.py: error: unrecognized arguments: --device-id=3 (webui) amur@bsc-7:~/text-generation-webui$ python server.py --model llama-13b --chat --device-id 3 usage: server.py [-h] [--notebook] [--chat] [--character CHARACTER] [--model MODEL] [--lora LORA [LORA ...]] [--model-dir MODEL_DIR] [--lora-dir LORA_DIR] [--model-menu] [--no-stream] [--settings SETTINGS] [--extensions EXTENSIONS [EXTENSIONS ...]] [--verbose] [--cpu] [--auto-devices] [--gpu-memory GPU_MEMORY [GPU_MEMORY ...]] [--cpu-memory CPU_MEMORY] [--disk] [--disk-cache-dir DISK_CACHE_DIR] [--load-in-8bit] [--bf16] [--no-cache] [--xformers] [--sdp-attention] [--trust-remote-code] [--load-in-4bit] [--compute_dtype COMPUTE_DTYPE] [--quant_type QUANT_TYPE] [--use_double_quant] [--threads THREADS] [--n_batch N_BATCH] [--no-mmap] [--mlock] [--cache-capacity CACHE_CAPACITY] [--n-gpu-layers N_GPU_LAYERS] [--n_ctx N_CTX] [--llama_cpp_seed LLAMA_CPP_SEED] [--wbits WBITS] [--model_type MODEL_TYPE] [--groupsize GROUPSIZE] [--pre_layer PRE_LAYER [PRE_LAYER ...]] [--checkpoint CHECKPOINT] [--monkey-patch] [--quant_attn] [--warmup_autotune] [--fused_mlp] [--gptq-for-llama] [--autogptq] [--triton] [--desc_act] [--flexgen] [--percent PERCENT [PERCENT ...]] [--compress-weight] [--pin-weight [PIN_WEIGHT]] [--deepspeed] [--nvme-offload-dir NVME_OFFLOAD_DIR] [--local_rank LOCAL_RANK] [--rwkv-strategy RWKV_STRATEGY] [--rwkv-cuda-on] [--listen] [--listen-host LISTEN_HOST] [--listen-port LISTEN_PORT] [--share] [--auto-launch] [--gradio-auth GRADIO_AUTH] [--gradio-auth-path GRADIO_AUTH_PATH] [--api] [--api-blocking-port API_BLOCKING_PORT] [--api-streaming-port API_STREAMING_PORT] [--public-api] [--multimodal-pipeline MULTIMODAL_PIPELINE] server.py: error: unrecognized arguments: --device-id 3
ok, my bad sorry about that let me get to my computer and try .. i got 2 cards so i should be able to see the difference - ill report back in a few min
device 0
device 1
as you can see it selects the correct gpu
--device-id does not work as you pointed out .. that was my mistake but the solution Ph0rk0z gave does work
device 0
device 1
as you can see it selects the correct gpu
--device-id does not work as you pointed out .. that was my mistake but the solution Ph0rk0z gave does work
Hmmm, That is so strange. Did you use python installed in a conda environment?
i basically run it in the conda env from the 1 click installer
Okay, I will try that, However I am using llama-13b model instead of the default GPT-Q, so I have to modify the the shell script XD Thanks for your help!
I have conda installed env but this works for all envs on my system. For instance when running audio generators, SD, etc.
Set visible gpu id in server.py works for me. add below code to server.py
os.environ['CUDA_VISIBLE_DEVICES'] = '1, 2, 3'
Sorry, but this doesn't work for me. ---- Replied Message ---- | From | @.> | | Date | 06/09/2023 12:31 | | To | oobabooga/text-generation-webui @.> | | Cc | LongZhenren @.>, Author @.> | | Subject | Re: [oobabooga/text-generation-webui] How to specify which GPU to use? CUDA_VISIBLE_DEVICES NOT working. (Issue #2559) |
Set visible gpu id in server.py works for me. add below code to server.py
os.environ['CUDA_VISIBLE_DEVICES'] = 'id'
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Where did u add the line? It should be add after the os imported and before all other codes.> Sorry, but this doesn't work for me.
---- Replied Message ---- | From | @.> | | Date | 06/09/2023 12:31 | | To | oobabooga/text-generation-webui @.> | | Cc | LongZhenren @.>, Author @.> | | Subject | Re: [oobabooga/text-generation-webui] How to specify which GPU to use? CUDA_VISIBLE_DEVICES NOT working. (Issue #2559) |
Set visible gpu id in server.py works for me. add below code to server.py
os.environ['CUDA_VISIBLE_DEVICES'] = 'id'
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Where did u add the line? It should be add after the os imported and before all other codes.> Sorry, but this doesn't work for me.
---- Replied Message ---- | From | @.**> | | Date | 06/09/2023 12:31 | | To | oobabooga/text-generation-webui @.**> | | Cc | LongZhenren @.**>, Author @.**> | | Subject | Re: [oobabooga/text-generation-webui] How to specify which GPU to use? CUDA_VISIBLE_DEVICES NOT working. (Issue #2559) | Set visible gpu id in server.py works for me. add below code to server.py os.environ['CUDA_VISIBLEDEVICES'] = 'id' — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @_.***>
Sorry to bother, but there were some problem with my anaconda environment. I fixed that and it finally works. For detail, I installed anaconda into /raid/another_username/data/anaconda3/environments, while some other user on the server changed the permission of this dir. So I was using the system default python environment, of which pytorch has no CUDA support. I checked and found out this problem.
To aid others who arrive here via search engine (this is one of the top results when searching how to specify a particular GPU), server.py appears deprecated, and the line must be added in one_click.py now, and the setting will take effect on next launch.
os.environ['CUDA_VISIBLE_DEVICES'] = '1, 2, 3'
where 1, 2, 3, is/are GPU ID that you would like exposed to the software.
As google got me here, I thought I would post what I found to fix this issue on Linux. edit server.py. under import os .. add a new line os.environ['CUDA_VISIBLE_DEVICES'] = '1' for your second gpu, or '2' ect. This works even you use the start_linux.sh or create your own /venv and use venv/bin/python ./server.py. I have not tested this on windows. I post this here for noobs that just need to know were it goes. I hope this gets the process is rapper script vs run scripts into noobs minds like mine. "a kick in the butt can be a great teacher"
Describe the bug
I ran this on a server with 4x RTX3090,GPU0 is busy with other tasks, I want to use GPU1 or other free GPUs. I set CUDA_VISIBLE_DEVICES env, but it doesn't work. How to specify which GPU to run on?
Is there an existing issue for this?
Reproduction
Always, when GPU0 is busy.
Screenshot
No response
Logs
System Info