markusressel / fan2go

A simple daemon providing dynamic fan speed control based on temperature sensors.
GNU Affero General Public License v3.0
234 stars 22 forks source link

Unable to set up my fans, fan2go.yaml settings seem to be ignored #324

Open dinvlad opened 1 month ago

dinvlad commented 1 month ago

Describe the bug I'm having trouble setting up my fan curves. I have set minPwm, startPwm and maxPwm for each fan, but fans are still getting initialized on every run, and the curve values don't appear to follow what I wanted.

To Reproduce Steps to reproduce the behavior:

  1. Run sudo /nix/store/84601dyyp25hqbq9a49kplnj76ljh1ji-fan2go-0.9.0/bin/fan2go -c fan2go.yaml -v (see the config below)
  2. Observe fans being initialized, with a long stream of Setting Fan PWM of ..., then Saving pwm map to fan..., and finally Measuring RPM of ...
  3. Observe a stream of Evaluating curve with Desired PWM: 0, despite temperature readings from sensors being noticeable above their min values.
  4. Re-run (1) and observe the same behavior each time (actually, Measuring part does seem to be skipped on subsequent runs; but not Setting Fan PWM part, which still takes a while and results in undesirable "slow spin down" behavior at the start)

Expected behavior

  1. Fans should not be initialized again on each run.
  2. Fans' PWM should be set according to linear curves (or an average of them) between min and max temp readings, and the corresponding minPwm and maxPwm values.

Screenshots My config:

dbPath: /var/lib/fan2go/fan2go.db

sensors:
  - id: cpu_package
    hwmon:
      platform: coretemp
      index: 1

  - id: disk_0
    hwmon:
      platform: drivetemp-scsi-0-0
      index: 1

  - id: disk_4
    hwmon:
      platform: drivetemp-scsi-4-0
      index: 1

  - id: nvme_0
    hwmon:
      platform: nvme-pci-01.*
      index: 1

  - id: nvme_1
    hwmon:
      platform: nvme-pci-06.*
      index: 1

curves:
  - id: cpu
    linear:
      sensor: cpu_package
      min: 20000
      max: 80000

  - id: disk_0
    linear:
      sensor: disk_0
      min: 20000
      max: 60000

  - id: disk_4
    linear:
      sensor: disk_4
      min: 20000
      max: 60000

  - id: bay_0
    function:
      type: average
      curves:
      - disk_0

  - id: bay_1
    function:
      type: average
      curves:
      - disk_4

  - id: nvme_0
    linear:
      sensor: nvme_0
      min: 20000
      max: 70000

  - id: nvme_1
    linear:
      sensor: nvme_1
      min: 20000
      max: 70000

  - id: mobo
    function:
      type: average
      curves:
      - nvme_0
      - nvme_1

fans:
  - id: cpu
    hwmon:
      platform: quadro
      rpmChannel: 3
      pwmChannel: 3
    controlAlgorithm: direct
    neverStop: true
    curve: cpu
    minPwm: 33
    startPwm: 33
    maxPwm: 255

  - id: bay_0
    curve: bay_0
    hwmon:
      platform: quadro
      rpmChannel: 2
      pwmChannel: 2
    controlAlgorithm: direct
    neverStop: true
    minPwm: 90
    startPwm: 90
    maxPwm: 200

  - id: bay_1
    curve: bay_1
    hwmon:
      platform: quadro
      rpmChannel: 4
      pwmChannel: 4
    controlAlgorithm: direct
    neverStop: true
    minPwm: 90
    startPwm: 90
    maxPwdm: 200

  - id: mobo
    curve: mobo
    hwmon:
      platform: quadro
      rpmChannel: 1
      pwmChannel: 1
    controlAlgorithm: direct
    neverStop: true
    minPwm: 25
    startPwm: 25
    maxPwm: 255

Here's what the log said after the (very long) initialization (which happens on each run):

  DEBUG   Evaluating curve 'cpu'. Sensor 'cpu_package' temp '32°'. Desired PWM: 0
  DEBUG   Evaluating curve 'nvme_0'. Sensor 'nvme_0' temp '35°'. Desired PWM: 0
  DEBUG   Evaluating curve 'nvme_1'. Sensor 'nvme_1' temp '35°'. Desired PWM: 0
  DEBUG   Evaluating curve 'disk_4'. Sensor 'disk_4' temp '22°'. Desired PWM: 0
  DEBUG   Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 0

And here's what fan2go detect showed around that same time:

> jc42
  Sensors  Index  Label                       Value
           1      hwmon6/temp1 (temp1_input)  30500

> drivetemp-scsi-4-0
  Sensors  Index  Label                       Value
           1      hwmon4/temp1 (temp1_input)  22000

> coretemp-isa-0000
  Sensors  Index  Label                       Value
           1      Package id 0 (temp1_input)  31000
           2      Core 0 (temp2_input)        25000
           3      Core 4 (temp6_input)        28000
           4      Core 8 (temp10_input)       26000
           5      Core 12 (temp14_input)      30000
           6      Core 16 (temp18_input)      28000
           7      Core 20 (temp22_input)      28000
           8      Core 24 (temp26_input)      29000
           9      Core 25 (temp27_input)      29000
           10     Core 26 (temp28_input)      28000
           11     Core 27 (temp29_input)      28000
           12     Core 28 (temp30_input)      27000
           13     Core 29 (temp31_input)      27000
           14     Core 30 (temp32_input)      27000
           15     Core 31 (temp33_input)      27000

> nvme-pci-0100
  Sensors  Index  Label                    Value
           1      Composite (temp1_input)  35850

> quadro-hid-3-6
  Fans     Index  Channel  Label              RPM  PWM  Auto 
           1      1        Fan 1 speed        327  25   false
           2      2        Fan 2 speed        577  90   false
           3      3        Fan 3 speed        400  33   false
           4      4        Fan 4 speed        588  90   false
           5      5        Flow speed [dL/h]  0    N/A  false
  Sensors  Index  Label                             Value
           1      Sensor 1 (temp1_input)            N/A  
           2      Sensor 2 (temp2_input)            N/A  
           3      Sensor 3 (temp3_input)            N/A  
           4      Sensor 4 (temp4_input)            N/A  
           5      Virtual sensor 1 (temp5_input)    N/A  
           6      Virtual sensor 2 (temp6_input)    N/A  
           7      Virtual sensor 3 (temp7_input)    N/A  
           8      Virtual sensor 4 (temp8_input)    N/A  
           9      Virtual sensor 5 (temp9_input)    N/A  
           10     Virtual sensor 6 (temp10_input)   N/A  
           11     Virtual sensor 7 (temp11_input)   N/A  
           12     Virtual sensor 8 (temp12_input)   N/A  
           13     Virtual sensor 9 (temp13_input)   N/A  
           14     Virtual sensor 10 (temp14_input)  N/A  
           15     Virtual sensor 11 (temp15_input)  N/A  
           16     Virtual sensor 12 (temp16_input)  N/A  
           17     Virtual sensor 13 (temp17_input)  N/A  
           18     Virtual sensor 14 (temp18_input)  N/A  
           19     Virtual sensor 15 (temp19_input)  N/A  
           20     Virtual sensor 16 (temp20_input)  N/A  

> jc42
  Sensors  Index  Label                       Value
           1      hwmon5/temp1 (temp1_input)  31500

> drivetemp-scsi-0-0
  Sensors  Index  Label                       Value
           1      hwmon3/temp1 (temp1_input)  23000

> nvme-pci-06f00
  Sensors  Index  Label                    Value
           1      Composite (temp1_input)  34850

And here are also the fan curves:

cpu

  Min PWM    33 
  Start PWM  33 
  Max PWM    255

 2550 ┤                                                                                              ╭────
 2380 ┤                                                                                     ╭────────╯
 2210 ┤                                                                            ╭────────╯
 2040 ┤                                                                   ╭────────╯
 1870 ┤                                                            ╭──────╯
 1700 ┤                                                     ╭──────╯
 1530 ┤ ╭─╮                                         ╭───────╯
 1360 ┤╭╯ ╰──╮                                 ╭────╯
 1190 ┤│     │                         ╭───────╯
 1020 ┤│     │                     ╭───╯
  850 ┤│     │                 ╭───╯
  680 ┤│     │            ╭────╯
  510 ┤│     ╰╮      ╭────╯
  340 ┤│      ╰──────╯
  170 ┤│
    0 ┼╯
                                                    RPM / PWM

bay_0

  Min PWM    90 
  Start PWM  90 
  Max PWM    200

 1404 ┤                                                                         ╭─────────────────────────
 1310 ┤                                                                     ╭───╯
 1217 ┤                                                                 ╭───╯
 1123 ┤                                                             ╭───╯
 1030 ┤                                                       ╭─────╯
  936 ┤                                                      ╭╯
  842 ┤                                                  ╭───╯
  749 ┤                                               ╭──╯
  655 ┤╭╮                                       ╭─────╯
  562 ┤│╰───────────────────────────────────────╯
  468 ┤│
  374 ┤│
  281 ┤│
  187 ┤│
   94 ┤│
    0 ┼╯
                                                    RPM / PWM

bay_1

  Min PWM    90 
  Start PWM  90 
  Max PWM    232

 1412 ┤                                                                          ╭────────────────────────
 1318 ┤                                                                      ╭───╯
 1224 ┤                                                                  ╭───╯
 1130 ┤                                                              ╭───╯
 1035 ┤                                                         ╭────╯
  941 ┤                                                       ╭─╯
  847 ┤                                                    ╭──╯
  753 ┤                                                ╭───╯
  659 ┤╭╮                                       ╭──────╯
  565 ┤│╰───────────────────────────────────────╯
  471 ┤│
  377 ┤│
  282 ┤│
  188 ┤│
   94 ┤│
    0 ┼╯
                                                    RPM / PWM

mobo

  Min PWM    25 
  Start PWM  25 
  Max PWM    255

 5145 ┤                                                                                               ╭───
 4802 ┤                                                                                         ╭─────╯
 4459 ┤                                                                                  ╭──────╯
 4116 ┤                                                                         ╭────────╯
 3773 ┤                                                                   ╭─────╯
 3430 ┤                                                             ╭─────╯
 3087 ┤╭╮                                                    ╭──────╯
 2744 ┤││                                              ╭─────╯
 2401 ┤││ ╭╮                                    ╭──────╯
 2058 ┤││ ││                               ╭────╯
 1715 ┤││╭╯│                        ╭──────╯
 1372 ┤│││ │                   ╭────╯
 1029 ┤│││ │             ╭─────╯
  686 ┤│││ │         ╭───╯
  343 ┤│╰╯ ╰─────────╯
    0 ┼╯
                                                    RPM / PWM

Desktop (please complete the following information):

Additional context I've tried various settings, setting the steps: or sensor readings in degrees rather than milli-degrees. But nothing seems to matter - either I get fans set to 255, or 0.. And initialization takes a while each time.

dinvlad commented 1 month ago

OK, looks like after changing min and max values from milli-degrees to degrees again, this time it's working - although the PWM values it sets in log messages don't seem to correspond to "desired PWM" 🤔

  DEBUG   Setting Fan PWM of 'mobo' to 85 ...
  DEBUG   Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 19
  DEBUG   Evaluating curve 'disk_4'. Sensor 'disk_4' temp '22°'. Desired PWM: 12
  DEBUG   Evaluating curve 'nvme_0'. Sensor 'nvme_0' temp '33°'. Desired PWM: 65
  DEBUG   Evaluating curve 'nvme_1'. Sensor 'nvme_1' temp '34°'. Desired PWM: 70
  DEBUG   Setting Fan PWM of 'cpu' to 59 ...
  DEBUG   Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 19
  DEBUG   Evaluating curve 'cpu'. Sensor 'cpu_package' temp '28°'. Desired PWM: 32
  DEBUG   Evaluating curve 'disk_4'. Sensor 'disk_4' temp '22°'. Desired PWM: 12
  DEBUG   Evaluating curve 'nvme_0'. Sensor 'nvme_0' temp '33°'. Desired PWM: 65
  DEBUG   Evaluating curve 'nvme_1'. Sensor 'nvme_1' temp '34°'. Desired PWM: 70
  DEBUG   Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 19

I did check the actual RPM values currently set, and they do seem to roughly correspond to where on the graph they would be, at that temperature.

So I think the remaining issue is initialization..

dinvlad commented 1 month ago

OK, and now I was able to skip initialization as well, by adding this to every fan:

    pwmMap:
      0: 0
      255: 255

I think the docs could be a little more clear on all of this, but perhaps I just didn't read them carefully enough.

dinvlad commented 1 month ago

Actually, I spoke too soon. While the desired values reported in the log now with this pwmMap seem to be the same as before, it sets PWM to 0 now (which is also reflected in all 0s reported by detect):

  DEBUG   Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 19
  DEBUG   Evaluating curve 'nvme_0'. Sensor 'nvme_0' temp '31°'. Desired PWM: 56
  DEBUG   Evaluating curve 'nvme_1'. Sensor 'nvme_1' temp '33°'. Desired PWM: 67
  DEBUG   Setting Fan PWM of 'bay_0' to 0 ...
  DEBUG   Evaluating curve 'disk_4'. Sensor 'disk_4' temp '22°'. Desired PWM: 12
  DEBUG   Setting Fan PWM of 'mobo' to 0 ...
  DEBUG   Evaluating curve 'cpu'. Sensor 'cpu_package' temp '30°'. Desired PWM: 44
  DEBUG   Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 19
  DEBUG   Setting Fan PWM of 'bay_1' to 0 ...
  DEBUG   Evaluating curve 'nvme_0'. Sensor 'nvme_0' temp '31°'. Desired PWM: 55
  DEBUG   Evaluating curve 'nvme_1'. Sensor 'nvme_1' temp '33°'. Desired PWM: 66
  DEBUG   Setting Fan PWM of 'cpu' to 0 ...
  DEBUG   Evaluating curve 'disk_4'. Sensor 'disk_4' temp '22°'. Desired PWM: 12
  DEBUG   Evaluating curve 'cpu'. Sensor 'cpu_package' temp '30°'. Desired PWM: 40
  DEBUG   Evaluating curve 'disk_0'. Sensor 'disk_0' temp '23°'. Desired PWM: 19
  DEBUG   Evaluating curve 'nvme_0'. Sensor 'nvme_0' temp '31°'. Desired PWM: 57
  DEBUG   Evaluating curve 'nvme_1'. Sensor 'nvme_1' temp '33°'. Desired PWM: 66
  DEBUG   Evaluating curve 'disk_4'. Sensor 'disk_4' temp '22°'. Desired PWM: 12
markusressel commented 1 month ago

I think the docs could be a little more clear on all of this, but perhaps I just didn't read them carefully enough.

Without a doubt, it's hard for me to explain all of this as I know too much about it (if you know what I mean).

There are definitely some misunderstandings here, and a lot to unpack, I will get back to you after work 🤞

markusressel commented 1 month ago

Hi @dinvlad , thx for your interest! Looks like you have a real spicy system to test fan2go on at your hands :smile:

I'm having trouble setting up my fan curves. I have set minPwm, startPwm and maxPwm for each fan, but fans are still getting initialized on every run, and the curve values don't appear to follow what I wanted.

The fan curve output does not change based on these parameters. The fan curve simply prints out a (rudimentary) graph of the PWM -> RPM measurement that fan2go took during the initialization of a fan. The Min/Start/Max are simply an indication of what the algorithm thinks these values should be based on the graph, unless you override them yourself in the config.

Fans should not be initialized again on each run.

They are not, at least not entirely. If you set overrides for the PWM "limits", the long initialization should be skipped entirely.

What is not skipped however, and is executed at each startup of fan2go is the calculation of the pwmMap (again, except if you set it yourself in the config). The pwmMap is used to make fan2go work with fans that do not operate in the expected 0..255 range, but f.ex. an 0..100 range, or even an extremely limited set like [0, 125, 255]. The only way for fan2go to determine this by itself is through trial and error, by setting every possible value in 0..255 and checking i fit succeeded. Since this can change due to external factors like driver updates (for a fan controller), this check is currently done on each startup and should only take about a second. Its certainly not ideal, but its what we got right now.

By specifying this:

 pwmMap:
      0: 0
      255: 255

you are essentially telling fan2go, that this particular fan definition can only use the PWM values 0 and 255 and nothing in between, which is probably not what you intended to do.

The "expected" pwmMap, would look like this (abbreviated for readability):

pwmMap:
    0: 0
    1: 1
    2: 2
    ....
    254: 254
    255: 255

There is nothing wrong with specifying this in the config to skip the pwmMap initialization, except its a bit ugly.

Fans' PWM should be set according to linear curves (or an average of them) between min and max temp readings, and the corresponding minPwm and maxPwm values.

I agree, that is precisely what should happen and that is what is (hopefully) implemented. If you see anything different, feel free to investigate and report, or even fix it :+1:

I think the docs could be a little more clear on all of this, but perhaps I just didn't read them carefully enough.

Again about this: If you have suggestions on how to change the README to better reflect this, please open a PR and let me know!

dinvlad commented 1 month ago

Thanks for clarification @markusressel , that makes sense.

this check is currently done on each startup and should only take about a second

Sadly, in my case it takes much, much longer than a second - I just measured and it was almost 11 minutes (!)

3.02user 5.76system 10:46.38elapsed

I've now specified pwmMap with all values between [0..255] at the top-level of fan2go.yaml, and used Yaml anchors to refer to it in fans. This seems to have addressed the issue of slow startup for now. However, I wonder if we should add a simple boolean flag to skip this feature entirely - sadly because it's not intuitive for newcomers how to turn it off with pwmMap, and that it can take this long, despite the intent being a quick succession.

markusressel commented 1 month ago

The reason this is done is because fan2go tries to automatically detect the best mode of operation. If we disable this feature by default, we might as well remove the entire logic, document the config and call it a day. This feature was added by request from fan2go users, I have never needed it myself. Some big brand "gaming" fan controllers just seem to work in weird and unexpected ways because... reasons :smile:

I am not too keen on going that route yet, can we possibly figure out why it takes so long on your system? All fan2go does is this:

// check every pwm value
pwmMap := map[int]int{}
for i := fans.MaxPwmValue; i >= fans.MinPwmValue; i-- {
    _ = fan.SetPwm(i)
    time.Sleep(pwmSetGetDelay)
    pwm, err := fan.GetPwm()
    if err != nil {
        ui.Warning("Error reading PWM value of fan %s: %v", fan.GetId(), err)
    }
    pwmMap[i] = pwm
}
f.pwmMap = pwmMap

SetPwm and GetPwm simply write/read an integer to/from a file. If this is slow on your system, there has to be a reason for it. Maybe we can account for that reason somehow?

PS: Nice trick using the anchors, I didn't even know viper supports this :smile:

dinvlad commented 1 month ago

Thanks @markusressel - I did notice a delay even when running fan2go detect. I think it's probably inherent in the fan controller I'm using - Aquacomputer Quadro, connected to the PC over USB.

Looking at their code and associated issue (which mentions fan2go btw), it sounds like the controller is "slow" enough that they had to introduce a ~200ms delay between reads and writes:

https://github.com/aleksamagicka/aquacomputer_d5next-hwmon/blob/f20c53c7edaee2a57b7aee7a64358864d207e75f/aquacomputer_d5next.c#L852-L864

https://github.com/aleksamagicka/aquacomputer_d5next-hwmon/blob/f20c53c7edaee2a57b7aee7a64358864d207e75f/aquacomputer_d5next.c#L75

https://github.com/aleksamagicka/aquacomputer_d5next-hwmon/issues/82#issuecomment-1637173240

256 200ms 2 (read + write) * 4 fans is ~410s or ~7 mins, close to 11 mins I'm seeing (there's also likely a USB communication delay, to further account for the difference).

So given that, would it be reasonable to introduce a config param to opt-out of PWM mapping (i.e. effectively assuming 0, 1, ... 255 -> 0, 1, ... 255 map)? Obviously, it would come with a disclaimer that fan control could be less accurate then.

GitHub
aquacomputer_d5next-hwmon/aquacomputer_d5next.c at f20c53c7edaee2a57b7aee7a64358864d207e75f · aleksamagicka/aquacomputer_d5next-hwmon
Linux hwmon driver for select Aquacomputer devices. Partly mainlined. - aleksamagicka/aquacomputer_d5next-hwmon
GitHub
aquacomputer_d5next-hwmon/aquacomputer_d5next.c at f20c53c7edaee2a57b7aee7a64358864d207e75f · aleksamagicka/aquacomputer_d5next-hwmon
Linux hwmon driver for select Aquacomputer devices. Partly mainlined. - aleksamagicka/aquacomputer_d5next-hwmon
markusressel commented 1 month ago

I am wondering if it makes sense to detect these devices and use specific defaults for them. Do the fans have a specific platform name that's unique to this controller? Maybe we can come up with a system (f.ex. additional config files) to specify overrides for specific platforms so other people can benefit from the findings that were made in issues like this one 🤔 I would have to look into it, but I would guess that there are even more specific IDs for the controller exposed somewhere, if the platform isn't specific enough.

dinvlad commented 1 month ago

These are just generic PWM fans (along with Noctua for the CPU), so I don't think we can detect the fans per se. But we can probably detect the controller (Quadro, in this case). I think potentially relying just on controller name (as reported by fan2go detect, for example) would be sufficient in this case.