nan0s7 / nfancurve

A small and lightweight POSIX script for using a custom fan curve in Linux for those with an Nvidia GPU.
GNU General Public License v3.0
314 stars 57 forks source link

don't work on multi gpu #22

Closed karvn closed 4 years ago

karvn commented 4 years ago

Hello, I have 2 gpus on my ubuntu linux. But only gpu 1 take effect with nfancurve. gpu 2 is no effect. Both gpu are gtx 1080 ti, and each gpu has 1 fan. This is the nvidia-smi output: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 00000000:01:00.0 On | N/A | | 65% 79C P2 227W / 250W | 123MiB / 11176MiB | 52% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A | | 50% 84C P2 82W / 250W | 16MiB / 11178MiB | 45% Default | +-------------------------------+----------------------+----------------------+

what could i do with this problem?

nan0s7 commented 4 years ago

Hmm can you give me the output of the script when it runs? If you run it with -l it will give more useful information. So like: ./temp.sh -l or however you run it.

karvn commented 4 years ago

@nan0s7 This is the full log when run ./temp.sh -l

################################################################################

nan0s7's script for automatically managing GPU fan speed

################################################################################

Configuration file: /home/tom/nfancurve-019.1/config

Attribute 'GPUFanControlState' (awar7:1[gpu:0]) assigned value 1.

Started process for n-GPUs and n-Fans

Attribute 'GPUTargetFanSpeed' (awar7:1[fan:0]) assigned value 45.

t=54 ot=200 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2

Attribute 'GPUTargetFanSpeed' (awar7:1[fan:0]) assigned value 45.

t=52 ot=54 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=49 ot=52 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=49 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=47 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2

Attribute 'GPUTargetFanSpeed' (awar7:1[fan:0]) assigned value 35.

t=45 ot=47 td=0 s=7 gpu=0 fan=0 cd=0 nsp=35 osp=35 maxt=85 mint=25 otl=1

Attribute 'GPUTargetFanSpeed' (awar7:1[fan:0]) assigned value 35.

t=44 ot=45 td=0 s=7 gpu=0 fan=0 cd=0 nsp=35 osp=35 maxt=85 mint=25 otl=1 t=43 ot=44 td=0 s=7 gpu=0 fan=0 cd=0 nsp=35 osp=35 maxt=85 mint=25 otl=1 t=45 ot=43 td=0 s=7 gpu=0 fan=0 cd=0 nsp=35 osp=35 maxt=85 mint=25 otl=1

Attribute 'GPUTargetFanSpeed' (awar7:1[fan:0]) assigned value 45.

t=48 ot=45 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2

Attribute 'GPUTargetFanSpeed' (awar7:1[fan:0]) assigned value 45.

t=49 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=50 ot=49 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=50 ot=50 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=51 ot=50 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=47 ot=51 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2

Attribute 'GPUTargetFanSpeed' (awar7:1[fan:0]) assigned value 35.

t=44 ot=47 td=0 s=7 gpu=0 fan=0 cd=0 nsp=35 osp=35 maxt=85 mint=25 otl=1 t=46 ot=44 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=50 ot=46 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=51 ot=50 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=52 ot=51 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=52 ot=52 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=53 ot=52 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=52 ot=53 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=55 ot=52 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=53 ot=55 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=54 ot=53 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=55 ot=54 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=55 ot=55 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=54 ot=55 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=55 ot=54 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=55 ot=55 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2

Attribute 'GPUTargetFanSpeed' (awar7:1[fan:0]) assigned value 50.

t=56 ot=55 td=0 s=7 gpu=0 fan=0 cd=0 nsp=50 osp=50 maxt=85 mint=25 otl=3 t=56 ot=56 td=0 s=7 gpu=0 fan=0 cd=0 nsp=50 osp=45 maxt=85 mint=25 otl=3 t=55 ot=56 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=54 ot=55 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=53 ot=54 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=52 ot=53 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=52 ot=52 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=51 ot=52 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=51 ot=51 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=51 ot=51 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=50 ot=51 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=50 ot=50 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=49 ot=50 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=49 ot=49 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=49 ot=49 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=49 ot=49 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=49 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=49 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=49 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=49 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=47 ot=49 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=47 ot=47 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=47 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=47 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=47 ot=47 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=47 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=47 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=47 ot=47 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=46 ot=47 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=47 ot=46 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=47 ot=47 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=47 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=46 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=47 ot=46 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=48 ot=47 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2

Attribute 'GPUTargetFanSpeed' (awar7:1[fan:0]) assigned value 35.

t=45 ot=48 td=0 s=7 gpu=0 fan=0 cd=0 nsp=35 osp=35 maxt=85 mint=25 otl=1 t=45 ot=45 td=0 s=7 gpu=0 fan=0 cd=0 nsp=35 osp=45 maxt=85 mint=25 otl=1 t=46 ot=45 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 t=47 ot=46 td=0 s=7 gpu=0 fan=0 cd=0 nsp=45 osp=45 maxt=85 mint=25 otl=2 ^C Attribute 'GPUFanControlState' (awar7:1[gpu:0]) assigned value 0.

Fan control set back to auto mode

kenbeese commented 4 years ago

I had same issue and fixed by following change.

--- a/temp.sh
+++ b/temp.sh
@@ -233,7 +233,6 @@ if [ -z "$num_gpus" ]; then
    prf "No GPUs detected"; exit 1
 elif [ "${#num_gpus}" -gt "2" ]; then
    num_gpus="${num_gpus%* GPUs on*}"
-else
    num_gpus_loop="$((num_gpus-1))"; num_fans_loop="$((num_fans-1))"
    prf "Number of GPUs detected: $num_gpus"
 fi
nan0s7 commented 4 years ago

Huh... I am honestly not sure what's going on in this section lol. I must have changed the logic and forgot to fix this particular part of the code. I'll post a fix soon!

nan0s7 commented 4 years ago

Alright, please test the latest temp.sh file and if that works I'll upload a 19.2 version. :)

karvn commented 4 years ago

@nan0s7 the latest temp.sh work perfect, both of 2 gpus can be controlled by the script.Thank you! @kenbeese thanks for your help!

nan0s7 commented 4 years ago

No worries, thanks for making me aware of the issue! And thanks for helping @kenbeese :D