nan0s7 / nfancurve

A small and lightweight POSIX script for using a custom fan curve in Linux for those with an Nvidia GPU.
GNU General Public License v3.0
314 stars 57 forks source link

Support for multi-fan GPUs #13

Closed edave closed 4 years ago

edave commented 5 years ago

Got the helpful error message to submit an issue in Github for this. Thanks!

In my case, this is legitimate as I have a GPU with a liquid cooler (one blower fan on the GPU card itself, another on the radiator liquid cooler).

One thing that is clear, at least for my GPU, is that the two fans are operated at very different speeds. One idles at 17%, the other at 50%. In the case for this GPU (a MSI RTX 2080), the fan #0 is the radiator fan, while fan #1 is the blower card. However, I would not be surprised if that varied by manufacturer.

Here's the output from the script:

################################################################################
#          nan0s7's script for automatically managing GPU fan speed            #
################################################################################

Configuration file: /home/edave/other-repositories/nfancurve/config
Number of Fans detected: 4
Number of GPUs detected: 2
Submit an issue on my GitHub page... happy to fix this :D

4 Fans on mythril:1

    [0] mythril:1[fan:0] (Fan 0)

      Has the following name:
        FAN-0

    [1] mythril:1[fan:1] (Fan 1)

      Has the following name:
        FAN-1

    [2] mythril:1[fan:2] (Fan 2)

      Has the following name:
        FAN-2

    [3] mythril:1[fan:3] (Fan 3)

      Has the following name:
        FAN-3

2 GPUs on mythril:1

    [0] mythril:1[gpu:0] (GeForce RTX 2080)

      Has the following names:
        GPU-0
        GPU-a1581e88-f69a-dec1-b0db-210270c47e4c

    [1] mythril:1[gpu:1] (GeForce RTX 2080)

      Has the following names:
        GPU-1
        GPU-2144f4f6-6446-7c6f-59fb-d730a756b27e

  Attribute 'GPUFanControlState' (mythril:1[gpu:0]) assigned value 0.

  Attribute 'GPUFanControlState' (mythril:1[gpu:1]) assigned value 0.

Fan control set back to auto mode
nan0s7 commented 5 years ago

Hey, sorry for the delay for my responce; I was on holiday. This is quite interesting though, I'll admit I didn't see this day coming anytime soon xD

Hopefully I can get a quick-fix up soon, but in the long run I agree with you that it may vary depending on the Manufacturer, so I'm planning on adding something to the config file. Or I could just take control of every detected fan and just treat it like normal.

Would you want to have control over each individual fan? For example, the quickfix would mean that both fans for the first GPU would be controlled by the temperature of that GPU, and the same goes for the second GPU and those two fans for it. However, if one of the fans is louder than the other, I could add an option to the config file that you could specify a second fan-curve and then you can assign a specific curve to individual fans. Obviously it'd still depend on the associated GPU's temperature, but it could help with noise and/or power draw.

edave commented 5 years ago

Having individual control over each fan would be key in this case- it's pretty clear that, for whatever reason, MSI prefers to run the blower and radiator fans at drastically different %s (at low temps, there's a difference of 30%)

Thanks so much for this!! Huge for doing machine learning tasks where you're running the GPU at capacity for many hours/days straight

On Thu, Jan 10, 2019 at 5:03 PM nan0s7 notifications@github.com wrote:

Hey, sorry for the delay for my responce; I was on holiday. This is quite interesting though, I'll admit I didn't see this day coming anytime soon xD

Hopefully I can get a quick-fix up soon, but in the long run I agree with you that it may vary depending on the Manufacturer, so I'm planning on adding something to the config file. Or I could just take control of every detected fan and just treat it like normal.

Would you want to have control over each individual fan? For example, the quickfix would mean that both fans for the first GPU would be controlled by the temperature of that GPU, and the same goes for the second GPU and those two fans for it. However, if one of the fans is louder than the other, I could add an option to the config file that you could specify a second fan-curve and then you can assign a specific curve to individual fans. Obviously it'd still depend on the associated GPU's temperature, but it could help with noise and/or power draw.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nan0s7/nfancurve/issues/13#issuecomment-453326579, or mute the thread https://github.com/notifications/unsubscribe-auth/AABIl-NC-XlLQryPhT1RwCwprhb1wb6Qks5vB-LAgaJpZM4ZuqQS .

-- Sent from my pirate ship

nan0s7 commented 5 years ago

No problem! I'm happy you're finding it useful :)

Alright so I've thrown together a really dodgy version that should hopefully work as intended/expected. I've added a secondary temp and fan curve to the config file that you can change. The fans in the script aren't aligned to the fans in your computer; for example, currently it treats fans 0 and 1 as the radiator fans, and fans 2 and 3 as the blower fans. I've enabled logging so just copy and paste the output of the script so I can see where to change things.

I know you mentioned that fan 0 is the rad and 1 is the blower, but I just want to make sure it works correctly AND that the fan speed difference you've noticed is present with the current script.

Hopefully you're able to help me iron out some things before I release the stable version (just being my tester). If you don't have the time or can't for whatever reason don't stress; it just means the updates will not come as quickly as I have to do my own testing.

Nonetheless, thanks for reporting the issue! :)

edave commented 5 years ago

Finally got to test it out this morning- sorry for the delay!

The multi-fan works! But to be clear, only for the 1st GPU. It appears there may be something up with the 2nd GPU? Kept getting these types of print statements: ./temp.sh: line 181: [: : integer expression expected. However, the script is taking over control of the fan speed for the 2nd GPU- confirmed that with Nvidia's X Server Settings app.

I've included the full output below.

################################################################################
#          nan0s7's script for automatically managing GPU fan speed            #
################################################################################

Configuration file: /home/edave/other-repositories/nfancurve/config
Number of Fans detected: 4
Number of GPUs detected: 2
tdiff average: 10
tdiff average: 10

  Attribute 'GPUFanControlState' (mythril:0[gpu:0]) assigned value 1.

  Attribute 'GPUFanControlState' (mythril:0[gpu:1]) assigned value 1.

esp=25 25 25 25 25 25 25 25 25 25 25 40 40 40 40 40 40 40 40 40 40 55 55 55 55 55 55 55 55 55 55 70 70 70 70 70 70 70 70 70 70 85 85 85 85 85 85 85 85 85 85 espln=51
esp2=15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 30 30 30 30 30 30 30 30 30 30 45 45 45 45 45 45 45 45 45 45 60 60 60 60 60 60 60 60 60 60 75 75 75 75 75 75 75 75 75 75 espln2=76
Started beta process for 4-fans and 2-gpus

  Attribute 'GPUTargetFanSpeed' (mythril:0[fan:0]) assigned value 25.

    t=29 oldt=29 tdiff=29 slp=7 gpu=0
    nspd?=25 nspd=25 cd=10
    cd2=7 mint=25 oldspd=25

  Attribute 'GPUTargetFanSpeed' (mythril:0[fan:1]) assigned value 25.

    t=27 oldt=27 tdiff=27 slp=7 gpu=1
    nspd?=25 nspd=25 cd=10
    cd2=7 mint=25 oldspd=25

./temp.sh: line 181: [: : integer expression expected
    t=27 oldt2=0 tdiff=27 slp=7 gpu=1
    nspd?=15 nspd=25 cd=10
    cd2=7 mint2= oldspd=15

./temp.sh: line 181: [: : integer expression expected
    t=27 oldt2=0 tdiff=27 slp=7 gpu=1
    nspd?=15 nspd=25 cd=10
    cd2=7 mint2= oldspd=15

    t=29 oldt=29 tdiff=27 slp=7 gpu=0
    nspd?=25 nspd=25 cd=10
    cd2=7 mint=25 oldspd=25

    t=27 oldt=27 tdiff=27 slp=7 gpu=1
    nspd?=25 nspd=25 cd=10
    cd2=7 mint=25 oldspd=25

./temp.sh: line 181: [: : integer expression expected
    t=27 oldt2=0 tdiff=27 slp=7 gpu=1
    nspd?=15 nspd=25 cd=10
    cd2=7 mint2= oldspd=15

./temp.sh: line 181: [: : integer expression expected
    t=27 oldt2=0 tdiff=27 slp=7 gpu=1
    nspd?=15 nspd=25 cd=10
    cd2=7 mint2= oldspd=15

^C
  Attribute 'GPUFanControlState' (mythril:0[gpu:0]) assigned value 0.

  Attribute 'GPUFanControlState' (mythril:0[gpu:1]) assigned value 0.

Fan control set back to auto mode
nan0s7 commented 5 years ago

Ah cool that's a good start! I'm working on a better solution right now. I'll see if I can make it more usable so you can give it a test.

nan0s7 commented 5 years ago

Alright I've uploaded the latest version for you to test out. Let me know how that goes; technically adding this feature to my script has made it ever so slightly more efficient for everyone so that is a bonus xD

EDIT: I've configured the config file to be what I assume you would use for your specific configuration, but feel free to change things around if you like. In theory it should still work as intended ;P

EDIT2 (9 hours later): I've made a small update that may fix an error you may have gotten with the update I gave 9 hours ago. I have a lot of things to fix up before this is going to function how we want it to (i.e. to obey the config properly), but right now I'm focussing on the logic of the code and making sure it actually works.

edave commented 5 years ago

I took it for a spin (ha!) early yesterday. I was in the middle of training on a data set, so one GPU was not under much load, while the other was at 99% utilization. Everything looked to be functioning as expected, but I did not have the time to go through and look for any bugs or anything unexpected (including getting you a copy of the output).

I'm out of the office today but will investigate further tomorrow. Thanks again!

nan0s7 commented 5 years ago

Sorry for the late reply, I didn't realise I didn't make a comment xD

But that's good news then! Oh don't worry, take your time. I'm in no rush, and it does give me some time to improve things while I wait for your feedback.

I pushed a small update just now that should make sure that the config is used properly while the script is running. I doubt it added any new bugs but who knows xD

edave commented 5 years ago

Super- I did some testing, where I set the first fan curve/temp at 100% to make any changes evident. The script properly detected the multi-GPU/multi-fans

I think I must not understand the config file or not be setting something correctly. It seems that something is happening where Fan 0 is not being controlled (despite being detected/labeled in the output) and Fan 1 is receiving the settings for Fan 0. Overall though, seems like we're getting very close!

Here's some output, the config file, and a screenshot from Nvidia X Settings for reference:

Config File

declare -a fcurve=( "100" "90" "55" "70" "85" ) # fan speeds
declare -a tcurve=( "25" "45" "55" "65" "75" ) # temperatures

# These two arrays are for GPU's that have a secondary fan that you may wish
#  to control seperately, especially if it is water-cooled.
declare -a fcurve2=( "15" "30" "45" "60" "75" )
declare -a tcurve2=( "35" "45" "55" "65" "75" )

# First number in array is fan 0, second number is fan 1, etc. If the number
#  is 1, that indicates that the script should use the first curve for that
#  fan. The same goes for the number 2.
declare -a which_curve=( "1" "2" "1" "2" )

#
default_fan="0"

# Similar to which_curve, but instead lets the script know which of the GPU's
#  has which fan. i.e. element 0 in the array being set to 0 means that fan 0
#  is assigned to GPU 0, element 1 is 0 too, meaning fan 1 is on GPU 0 as well
declare -a fan2gpu=( "0" "0" "1" "1" )

Terminal Output

edave@mythril:~/other-repositories/nfancurve$ ./temp.sh 

################################################################################
#          nan0s7's script for automatically managing GPU fan speed            #
################################################################################

Configuration file: /home/edave/other-repositories/nfancurve/config
Number of Fans detected: 4
Number of GPUs detected: 2
tdiff average: 12
tdiff average: 10

  Attribute 'GPUFanControlState' (mythril:0[gpu:0]) assigned value 1.

  Attribute 'GPUFanControlState' (mythril:0[gpu:1]) assigned value 1.

esp=100 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 55 55 55 55 55 55 55 55 55 70 70 70 70 70 70 70 70 70 70 85 85 85 85 85 85 85 85 85 85 espln=51
esp2=15 15 15 15 15 15 15 15 15 15 15 30 30 30 30 30 30 30 30 30 30 45 45 45 45 45 45 45 45 45 45 60 60 60 60 60 60 60 60 60 60 75 75 75 75 75 75 75 75 75 75 espln2=51
Started process for n-GPUs and n-Fans

  Attribute 'GPUTargetFanSpeed' (mythril:0[fan:0]) assigned value 90.

    t=29 oldt=0 tdiff=29 slp=7 gpu=0
    nspd?=90 nspd=90 cd=
    cd2=9 mint=25 oldspd=55 fan=0

    t=29 oldt=29 tdiff=29 slp=7 gpu=0
    nspd?=15 nspd=90 cd=
    cd2=7 mint=25 oldspd=15 fan=1

  Attribute 'GPUTargetFanSpeed' (mythril:0[fan:2]) assigned value 90.

    t=28 oldt=0 tdiff=28 slp=7 gpu=1
    nspd?=90 nspd=90 cd=
    cd2=9 mint=25 oldspd=55 fan=2

    t=28 oldt=28 tdiff=28 slp=7 gpu=1
    nspd?=15 nspd=90 cd=
    cd2=7 mint=25 oldspd=15 fan=3

    t=29 oldt=29 tdiff=28 slp=7 gpu=0
    nspd?=90 nspd=90 cd=
    cd2=9 mint=25 oldspd=90 fan=0

    t=29 oldt=29 tdiff=28 slp=7 gpu=0
    nspd?=15 nspd=90 cd=
    cd2=7 mint=25 oldspd=15 fan=1

    t=28 oldt=28 tdiff=28 slp=7 gpu=1
    nspd?=90 nspd=90 cd=
    cd2=9 mint=25 oldspd=90 fan=2

    t=28 oldt=28 tdiff=28 slp=7 gpu=1
    nspd?=15 nspd=90 cd=
    cd2=7 mint=25 oldspd=15 fan=3

^C
  Attribute 'GPUFanControlState' (mythril:0[gpu:0]) assigned value 0.

  Attribute 'GPUFanControlState' (mythril:0[gpu:1]) assigned value 0.

Fan control set back to auto mode

NVidia X Server Settings

(while script is running) screenshot from 2019-01-21 20-13-29

nan0s7 commented 5 years ago

Hmm... well at least we're getting closer!

I've added some more debugging information in the latest update around the areas I believe the problem is hiding, and I updated the config default values for the "which_curve" array because how you have it now is what it was supposed to be; whoops.

I did notice that your min_t is the same as the first element in the temperature array for the first curve (they're both equal to 25). I don't think this will cause any problems, but it just means the first value in your fan array (in the config above, it's 100) won't be used. I'll make a note to add a warning for that :P

edave commented 5 years ago

Ah yea, thanks for pointing that out! I updated the min_t, and also set the fan1/2 curves to be high values again so you could easily see the result.

Here's the output using the latest.

Config

# min_t is the temperature at which every temperature below it will cause
#  the fan speed to be set to 0%, and everything above will be whatever the
#  first speed in fcurve is (default of 25%)
# min_t2 is only used with the second fan speed and temperature arrays, so
#  there is no need to change it unless you're using the second curve
min_t="15"
min_t2="15"

# How long the script should wait until checking for a change in temps
# The first value (default: 7) is for long idle periods
# The second value (default: 5) is if the script detects a change in temps
#  but isn't great enough to alter the fan speed
long_s="7"
short_s="5"

# By default it's set up so that when the temp is less than or equal to 35
#  degrees, the fan speed will be set to 25%. Next, if the temp is between 36
#  and 45, the fan speed should be set to 40%, etc.
# The last temperature value will be the maximum temperature before 100% fan
#  speed will be set
# You can make the array as big or as small as you require, as long as they
#  both end up being the same size
declare -a fcurve=( "75" "80" "55" "70" "85" ) # fan speeds
declare -a tcurve=( "20" "45" "55" "65" "75" ) # temperatures

# These two arrays are for GPU's that have a secondary fan that you may wish
#  to control seperately, especially if it is water-cooled.
declare -a fcurve2=( "90" "95" "45" "60" "75" )
declare -a tcurve2=( "20" "45" "55" "65" "75" )

# First number in array is fan 0, second number is fan 1, etc. If the number
#  is 1, that indicates that the script should use the first curve for that
#  fan. The same goes for the number 2.
declare -a which_curve=( "1" "2" "1" "2" )

#
default_fan="0"

# Similar to which_curve, but instead lets the script know which of the GPU's
#  has which fan. i.e. element 0 in the array being set to 0 means that fan 0
#  is assigned to GPU 0, element 1 is 0 too, meaning fan 1 is on GPU 0 as well
declare -a fan2gpu=( "0" "0" "1" "1" )

Script Output

edave@mythril:~/other-repositories/nfancurve$ ./temp.sh 

################################################################################
#          nan0s7's script for automatically managing GPU fan speed            #
################################################################################

Configuration file: /home/edave/other-repositories/nfancurve/config
Number of Fans detected: 4
Number of GPUs detected: 2
tdiff average: 13
tdiff average: 13

  Attribute 'GPUFanControlState' (mythril:0[gpu:0]) assigned value 1.

  Attribute 'GPUFanControlState' (mythril:0[gpu:1]) assigned value 1.

esp=75 75 75 75 75 75 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 55 55 55 55 55 55 55 55 55 55 70 70 70 70 70 70 70 70 70 70 85 85 85 85 85 85 85 85 85 85 espln=61
esp2=90 90 90 90 90 90 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 45 45 45 45 45 45 45 45 45 45 60 60 60 60 60 60 60 60 60 60 75 75 75 75 75 75 75 75 75 75 espln2=61
Started process for n-GPUs and n-Fans

  Attribute 'GPUTargetFanSpeed' (mythril:0[fan:0]) assigned value 80.

    t=29 oldt=0 tdiff=29 slp=7 gpu=0
    nspd?=80 nspd=80 cd=13 maxt=75
    cd2=10 mint=15 oldspd=70 fan=0 z=yes

    t=29 oldt=29 tdiff=29 slp=7 gpu=0
    nspd?=95 nspd=80 cd=13 maxt=75
    cd2=10 mint=15 oldspd=95 fan=1 z=yes

  Attribute 'GPUTargetFanSpeed' (mythril:0[fan:2]) assigned value 80.

    t=28 oldt=0 tdiff=28 slp=7 gpu=1
    nspd?=80 nspd=80 cd=13 maxt=75
    cd2=10 mint=15 oldspd=70 fan=2 z=yes

    t=28 oldt=28 tdiff=28 slp=7 gpu=1
    nspd?=95 nspd=80 cd=13 maxt=75
    cd2=10 mint=15 oldspd=95 fan=3 z=yes

    t=29 oldt=29 tdiff=28 slp=7 gpu=0
    nspd?=80 nspd=80 cd=13 maxt=75
    cd2=10 mint=15 oldspd=80 fan=0 z=no

    t=29 oldt=29 tdiff=28 slp=7 gpu=0
    nspd?=95 nspd=80 cd=13 maxt=75
    cd2=10 mint=15 oldspd=95 fan=1 z=yes

    t=28 oldt=28 tdiff=28 slp=7 gpu=1
    nspd?=80 nspd=80 cd=13 maxt=75
    cd2=10 mint=15 oldspd=80 fan=2 z=no

^C
  Attribute 'GPUFanControlState' (mythril:0[gpu:0]) assigned value 0.

  Attribute 'GPUFanControlState' (mythril:0[gpu:1]) assigned value 0.

Fan control set back to auto mode

GPU 0 Nvidia X Server

screenshot from 2019-01-23 13-01-22

GPU 1 NVidia X Server

screenshot from 2019-01-23 13-01-24

nan0s7 commented 5 years ago

Alright awesome, I think I fixed it now... :P

I also added that check I mentioned about min_t being the same as one of the values of tcurve, too.

Fingers crossed xD

nan0s7 commented 5 years ago

Hey, I'm not sure if you're still using this script but if you are I'd love to hear back to see how it's going. If it's all working fine I'll close this issue, and publish the new release. Any issues I'd be happy to sort out.

louissmit commented 5 years ago

Hey @nan0s7 I'm running 2.17.1 of your script on a EVGA RTX 2080ti and it only seems to control one of the fans. In NVIDIA X server settings I can control both manually, the script just only controls the default fan (in the config). I don't know if the outcome of this thread resolves this issue and if it has been published in 2.17.1, but if you could help me make this work that would be great! Thanks in advance

nan0s7 commented 5 years ago

Hey thanks for taking a look! That's certainly weird... how are you cooling the 2080ti? Like watercooled or is it just the stock fans or something. I may have solved the issue sometime after the latest release, so give the latest source version a go. You can enable logging by adding -l (that's an L) as a launch parameter; just copy and paste the output here after it'd been running for a few seconds. If you could also post the output of the following commands that'd help figure out what's going on. nvidia-settings -q fans and nvidia-settings -q gpus.

But yeah, if it's something wrong with the way my script works I'd be more than happy to fix this so long as you can test it out for me! :P

louissmit commented 5 years ago

Hey, yeah its just the stock fans. I checked out out the latest master and these are your requested logs:

Number of GPUs detected: 1

  Attribute 'GPUFanControlState' (louis-desktop:0[gpu:0]) assigned value 1.

...
Started process for 1 GPU and 1 Fan

So it seems the script doesnt recognize the second fan that nvidia-settings does see:

2 Fans on louis-desktop:0

    [0] louis-desktop:0[fan:0] (Fan 0)

      Has the following name:
        FAN-0

    [1] louis-desktop:0[fan:1] (Fan 1)

      Has the following name:
        FAN-1
1 GPU on louis-desktop:0

    [0] louis-desktop:0[gpu:0] (GeForce RTX 2080 Ti)

      Has the following names:
        GPU-0
nan0s7 commented 5 years ago

Wait so how many fans does my script say there are detected? Also if you run the script with the logs enabled (./temp.sh -l or bash temp.sh -l) it'll spit out some lines every time the script checks for the latest temperature. It'll have like "t=... oldt=..." etc., which would also help with diagnosing the problem.

But yeah your GPU configuration seems like a normal setup so I don't know exactly why the script isn't changing the speeds of all of the fans on the graphics card.

Thanks for the quick reply!

EDIT: Just realised that it says Started process for 1 GPU and 1 Fan. So clearly my script has only detected the one fan.

EDIT2: Just uploaded a small commit to master if you wanna try that.

louissmit commented 5 years ago

Yess, that latest commit did the trick, thanks !

nan0s7 commented 5 years ago

Awesome! Now I can finally close this issue! :D

Thanks for the help!

davidgwyrick commented 4 years ago

I read and reread this thread and still can't get the script to control both fans on my RTX2080. The script indicates that it

Started process for n-GPUs and n-Fans

but only seems to assigning a fan speed to fan:0

Attribute 'GPUTargetFanSpeed' (mazzucatomarvin:0[fan:0]) assigned value 40

Running ./temp.sh -l shows that it is tracking both fans, but not updating fan:1.

While the script is running, I can run the following command and it updates fan:1 speed.

nvidia-settings` -a "[fan:1]/GPUTargetFanSpeed=60"

Attribute 'GPUTargetFanSpeed' (mazzucatomarvin:0[fan:1]) assigned value 60.

There are 2 Fans!!

$ nvidia-settings -q fans

2 Fans on mazzucatomarvin:0

    [0] mazzucatomarvin:0[fan:0] (Fan 0)

      Has the following name:
        FAN-0

    [1] mazzucatomarvin:0[fan:1] (Fan 1)

      Has the following name:
        FAN-1

Am I missing something?!

nan0s7 commented 4 years ago

Hmm that is strange. The only thing I can't tell from your report is the temperatures you're seeing. If you're just using the one 2080, by default, the script allocates two different sets of values at which to change the fan speed, to each fan. The setting in the config to look at is the which_curve string.

I'm looking through the code just to make sure I haven't accidentally missed something during the most recent update.

May I have the output of the start of the log when running ./temp.sh -l, including a few lines of temperature readings?

Thanks for posting this though, as I don't have this kind of graphics card so I can't really test this reliably... :P

davidgwyrick commented 4 years ago
Number of GPUs detected: 1

  Attribute 'GPUFanControlState' (mazzucatomarvin:0[gpu:0]) assigned value 1.

Started process for n-GPUs and n-Fans

  Attribute 'GPUTargetFanSpeed' (mazzucatomarvin:0[fan:0]) assigned value 45.

 t=33 ot=200 td=0 s=2 gpu=0 fan=0 cd=4 nsp=45 osp=45 maxt=75 mint=19 otl=2
 t=33 ot=0 td=0 s=2 gpu=0 fan=1 cd=4 nsp=45 osp=45 maxt=75 mint=19 otl=2
 t=33 ot=33 td=0 s=2 gpu=0 fan=0 cd=4 nsp=45 osp=45 maxt=75 mint=19 otl=2
 t=33 ot=33 td=0 s=2 gpu=0 fan=1 cd=4 nsp=45 osp=45 maxt=75 mint=19 otl=2
 t=33 ot=33 td=0 s=2 gpu=0 fan=0 cd=4 nsp=45 osp=45 maxt=75 mint=19 otl=2
 t=33 ot=33 td=0 s=2 gpu=0 fan=1 cd=4 nsp=45 osp=45 maxt=75 mint=19 otl=2

And this just goes on, with the script never updating fan:1 (which is fan 0 in the nvidia-settings). My temperature-fan curves are both as follows, so a temperature of 33 should have set both fans to a speed of 40%. Any help would be greatly appreciated! I'm running a lot of GPU intensive analyses that push the temperature up pretty high, so it would be nice if both fans tracked the temperature. Not sure why Nvidia doesn't do this automatically.

fcurve="40 40 45 50 60 75 85 90 95 99" # fan speeds
tcurve="20 30 35 45 50 55 60 65 70 75" # temperatures

NvidiaFan

nan0s7 commented 4 years ago

with the script never updating fan:1 (which is fan 0 in the nvidia-settings)

Wait... what? That is so weird. Shouldn't be a problem so long as the commands show the right fans.

So I've made a dirty patch for you to test out. Hopefully this fixes or at least gives us more information as to what the problem is.

Yeah I don't know why Nvidia doesn't do this automatically - even on Windows. At least Windows has third-party overclocking tools that you can use, but for some reason basically none of them support Linux.

Anyway, give that a test, and I'd of course appreciate the output of the logging to make sure it's working as expected. Thanks for your help!

davidgwyrick commented 4 years ago

I think that worked! Magic!

nan0s7 commented 4 years ago

Awesome! Let me know if there are any other problems!