nan0s7 / nfancurve

A small and lightweight POSIX script for using a custom fan curve in Linux for those with an Nvidia GPU.
GNU General Public License v3.0
314 stars 57 forks source link

script looks at temp of 1st GPU only #24

Closed michaeltlu closed 4 years ago

michaeltlu commented 4 years ago

Nice job on this script!

I have one issue. In a multigpu setup, I've noticed that the script actions are based on the temperature of the first gpu only. So in a situation where the 2nd GPU only is being used, the fan speed does not increase with the 2nd GPU's temperatures. To make matters worse, the periodic checks on the temperature of the 1st GPU suppress the default Nvidia fan curve, leading to lower fan speeds on GPU2 than are appropriate.

Would it be possible to modify get_temp() so that it checks the temperature of all of the GPUs, then returns the temperature of the hottest one?

nan0s7 commented 4 years ago

Hmm I'll give it a look; I remember wanting to do something similar with regards to using the hottest GPU for the calculations. I'll keep you up to date with anything I change, especially with regards to only using the non-first GPU in a multi-GPU setup.

michaeltlu commented 4 years ago

Thank you!

michaeltlu commented 4 years ago

I apologize, this was my mistake. I didn't correctly change the config file to specify that each of my GPUs has one fan rather than two. As a result, both GPUs' single fans were tied to the first GPU's temperature. Once changed in the config file, the script is working as intended.

Sorry for the misunderstanding, and thanks for writing this script!

imadcat commented 4 years ago

I apologize, this was my mistake. I didn't correctly change the config file to specify that each of my GPUs has one fan rather than two. As a result, both GPUs' single fans were tied to the first GPU's temperature. Once changed in the config file, the script is working as intended.

Sorry for the misunderstanding, and thanks for writing this script!

Could you share on how to change the config for multiple GPU independent fan control?

nan0s7 commented 4 years ago

@imadcat well by default it should work fine unless you have a strange GPU or one I haven't had tested yet. What exactly do you want to change? By default it assigns alternating fan curves to each fan on your GPU, so if your GPU has more than one controllable fan on one GPU, you may need to change the which_curve setting in the config file.

imadcat commented 4 years ago

@imadcat well by default it should work fine unless you have a strange GPU or one I haven't had tested yet. What exactly do you want to change? By default it assigns alternating fan curves to each fan on your GPU, so if your GPU has more than one controllable fan on one GPU, you may need to change the which_curve setting in the config file.

Thank you for the quick reply! I have 2 * 1080Ti and a Ubuntu 20.04 system, each GPU card has only one fan. The default code after cloning your repo won't work. Here's what works in my system

  1. modify nfancurve.service file for service auto start ExecStart=/bin/sh /opt/nfancurve/temp.sh -c /opt/nfancurve/config
    mv nfancurve.service /etc/systemd/user/
    sudo systemctl --user start nfancurve.service
    sudo systemctl --user status nfancurve.service
    sudo systemctl --user enable nfancurve.service
  2. enable multiple GPU setup
    nvidia-xconfig --enable-all-gpus
    nvidia-xconfig --cool-bits=4
  3. modify config file
    which_curve="1 1 1 1"
    fan2gpu="0 1 1 1"

Note without modification 3 the service can still control 2 GPU cards' fans, but both fan speed controls are linked to GPU:0's temperature. Which means even when GPU:1 is not used and temperature is low, its fan speed will be 100% as long as GPU:0's temp is high.

I think there's a confusion in the default setting of the line fan2gpu="0 0 1 1", it leads to both GPU cards' fans be associated to GPU:0's temp.

nan0s7 commented 4 years ago

Thanks for the detailed reply!

So yeah, with your config fan2gpu will have to be changed from the default value in the config. I should be able to add a check function to this fairly easily.

Although changing which_curve shouldn't be required. If you have that as the default value (=1 2 1 2), does it still work as expected? If not, it must be a dependance on the array element between the two.

imadcat commented 4 years ago

changing which_curve is not required, the default value will still work, I just wanted to have the 2 GPUs behave the same.

imadcat commented 4 years ago

@nan0s7 any plan making this work under headless server mode?

nan0s7 commented 4 years ago

Well technically the script works fine headlessly. But.... nvidia-settings doesn't work very well without a display server. Apparently some people have been able to make nvidia-settings work by using a fake / dummy X server, but so far I don't know anyone who has been sucessful using that in tandem with my script. I am thinking of looking into an alternate method to call functions from nvidia-settings, but I doubt that would improve things much. There might be a way to directly control things without nvidia-settings but I can't really test that easily... :/