marazmista / radeon-profile

Application to read current clocks of ATi Radeon cards (xf86-video-ati, xf86-video-amdgpu)
GNU General Public License v2.0
740 stars 75 forks source link

No Fan Control for RX 7900 #293

Open PorcelainMouse opened 1 year ago

PorcelainMouse commented 1 year ago

radeon-profile seemed to work great with my 5700XT and 6800XT, but seems to have no effect at all on my 7900. Is anyone else seeing this? Wouldn't be a problem except that it seems to be running hot. That was a problem with my 5700XT, the onboard/bios fan curve was way to shallow.

Is there something I can do to help?

PorcelainMouse commented 1 year ago

As others have reported, I also see this msg about the time I start radeon-profile amdgpu: manual fan speed control should be enabled first in the journal.

PorcelainMouse commented 1 year ago

Okay, so, this seems to be more of a problem that I expected, since I had some crashes, and all I can think is that the card is over heating. More testing will tell, but that's all I have at this point.

In the meantime, my question is: what component is broken that makes the fan control interface that exists as /sys/drm/.../hwmon/pwm1 just not work for this card? Is it the kernel? Is it firmware? Is it AMDGPU?

jonesBUBer2 commented 1 year ago

Okay, so, this seems to be more of a problem that I expected, since I had some crashes, and all I can think is that the card is over heating. More testing will tell, but that's all I have at this point.

In the meantime, my question is: what component is broken that makes the fan control interface that exists as /sys/drm/.../hwmon/pwm1 just not work for this card? Is it the kernel? Is it firmware? Is it AMDGPU?

Same here, PMouse. I have some 6800XTs that run fine with radeon-profile but the new 7900XT I just picked up has problems -no fan control and other frequent reboots running Ubuntu 22.04.2 on an ASUS X670E tuf gaming series with a Ryzen 7800X3D cpu. I'm trying to run some opencl work under einstein@home and the load that brings along will crash this thing sometimes in minutes. Other times hours. I've been playing with trying to under clock and reduce even the default stuff this mobo brings along. Not affected here by the SoC over volt issue these AMD5 boards with these cpus had - got the BIOS update before I even had the box finished. I notice with my ubuntu kernel I'm not getting full lmsensor data either from the CPU. I get a few off the GPU though. Wish I could watch more temps but I never see the cpu cores getting too hot and have a large HSF although air cooling only for now.

wae08 commented 1 year ago

I also have no fan control on a 7600, max its shown the fan go was around 50% had a crash as i tried out Starfield and hit 85 degrees :/ any fix anyone?

Carter2565 commented 1 year ago

I have the AMD Radeon™ RX 7900 XTX and still no fan control anyone get control using another program?

PorcelainMouse commented 1 year ago

Just a follow up here. I went deep and replaced 100% of my system and RMA-ed my card 3 times trying to troubleshoot this problem...no joy. Pretty disappointed with AMD and PowerColor.

However, they have convinced me that the problem is NOT overheating. While I do want to be able to control the fan curve--I don't see any point to having such a week fan curve when my case fans are way louder--I'm quite confident this has nothing to do with the crashing. I switched BOINC applications and the system is very stable. The latest generation of cards self-regulates quite well, apparently. The root cause of my problems is hardware/software, it seems. I've been told to try ROCm 5.7, and after that, 6.0 for improved support.

PorcelainMouse commented 1 year ago

On the other hand, I see on Phoronix forums that support for voltage and clock tuning is beyond the initial stages. I get the sense it's available, possibly upstream, but available. I'm not sure where to look, but maybe radeon-profile maintainers do?

StatusCode404 commented 1 year ago

On the other hand, I see on Phoronix forums that support for voltage and clock tuning is beyond the initial stages. I get the sense it's available, possibly upstream, but available. I'm not sure where to look, but maybe radeon-profile maintainers do?

Overclocking is easy on Linux! No real need for Windows type flashy tools. Just understand the commands then write a script, then for permanency you can run it as a startup script.

  1. Start here: https://wiki.archlinux.org/title/AMDGPU#Overclocking
  2. Then write a script

Point 1 above will give you pointers on how to set voltages power limits, frequencies for CPU and GPU etc.

EXAMPLE: I don't under-volt with my Powercolor Liquid Devil Ultimate since it is under coolant and a cool running binned chip. I have set my power limit to 381w. I also leave my memory on default. Here is my script as an example...

6900XTXH-OC.sh

#!/bin/bash

sudo bash -c "echo 381000000 > /sys/class/drm/card0/device/hwmon/hwmon2/power1_cap" && \
sudo bash -c "echo 's 1 2800' > /sys/class/drm/card0/device/pp_od_clk_voltage" && \
sudo bash -c "echo 'c' > /sys/class/drm/card0/device/pp_od_clk_voltage" && \
echo "Liquid Devil Overclock applied... 2.8GHz with 381w power!";

I just basically run the above and it is overclocked to what it says in that echo statement above.

PorcelainMouse commented 1 year ago

Cool, yeah, I guess. That's not changing the fan curve. Is there a way to do that? Fair, though; I guess I didn't make that clear in my post, but that was what I was driving at in this issue.

Where do you find the documentation for these parameters? 381,000,000? That would be in microwatts? Odd unit, but okay, I guess it needs to be an integer.

StatusCode404 commented 1 year ago

Cool, yeah, I guess. That's not changing the fan curve. Is there a way to do that? Fair, though; I guess I didn't make that clear in my post, but that was what I was driving at in this issue.

Where do you find the documentation for these parameters? 381,000,000? That would be in microwatts? Odd unit, but okay, I guess it needs to be an integer.

Have a look at that link I pasted in the previous post, the archlinux wiki on overclocking.

You can also cross-reference with this... https://www.reddit.com/r/Amd/comments/agwroj/how_to_overclock_your_amd_gpu_on_linux/

PorcelainMouse commented 11 months ago

Have a look at that link I pasted in the previous post, the archlinux wiki on overclocking.

Oh, I did. I didn't see anything about the fan curve there.

You can also cross-reference with this... https://www.reddit.com/r/Amd/comments/agwroj/how_to_overclock_your_amd_gpu_on_linux/

Yes, I think this information is all quite old, though, right? These interfaces have been exposed for a long time. I don't need to manually control them, because tools like radeon profile work fine...but not for 7900 & RDNA3. I'm just not see this information as applicable. I'm NOT trying to overclock; I hope that is clear. I can see why the information I'm seeking might show up in an overlcocking guide, but it isn't in the ones that have been referenced in this thread.

It's been reported by several people that these old interfaces are known to NOT WORK with the new generation of cards. And we know why: the old interface is being deprecated, and they haven't decided on a new standard. I know, it's frustrating. But, nothing that existed before RDNA3 is going to work for RDNA3. And I'm pretty sure they're not going to expose the fan speed directly, as has been the case for a long time. We will not be able to set the fan speed to a constant percent or RPM. They're not going to rebuild that capability.

StatusCode404 commented 11 months ago

Oh, I did. I didn't see anything about the fan curve there.

Oh yes, you did mention you were after fan control. Apologies I was for some reason only thinking of power and overclocking and controlling volts.

Yes, I think this information is all quite old, though, right? These interfaces have been exposed for a long time. I don't need to manually control them, because tools like radeon profile work fine...but not for 7900 & RDNA3. I'm just not see this information as applicable. I'm NOT trying to overclock; I hope that is clear. I can see why the information I'm seeking might show up in an overlcocking guide, but it isn't in the ones that have been referenced in this thread.

Yeah mate, sorry I thought you were after overclocking. However the base path for your RDNA3 card hwmon files are in the same area and that includes fan config, input and variables for manipulation. Linux doesn't change its stripes for drm cards just because AMD have a new card. They have to follow the Linux "way" to interface with Linux. In your case you will need to find the base path for your card. In Linux, all hardware devices have a file for everything! Everything! from disks, to GPU, to printers, everything! You manipulate the file to manipulate the device. Here's mine for example...

danglingpointer@Vault:/sys/class/drm/card0/device/hwmon/hwmon2$ ls -l
total 0
lrwxrwxrwx 1 root root    0 Nov 20 09:16 device -> ../../../0000:0e:00.0
-rw-r--r-- 1 root root 4096 Nov 25 21:56 fan1_enable
-r--r--r-- 1 root root 4096 Nov 20 09:16 fan1_input
-r--r--r-- 1 root root 4096 Nov 20 09:16 fan1_max
-r--r--r-- 1 root root 4096 Nov 20 09:16 fan1_min
-rw-r--r-- 1 root root 4096 Nov 25 21:56 fan1_target
-r--r--r-- 1 root root 4096 Nov 25 21:56 freq1_input
-r--r--r-- 1 root root 4096 Nov 25 21:56 freq1_label
-r--r--r-- 1 root root 4096 Nov 25 21:56 freq2_input
-r--r--r-- 1 root root 4096 Nov 25 21:56 freq2_label
-r--r--r-- 1 root root 4096 Nov 20 09:16 in0_input
-r--r--r-- 1 root root 4096 Nov 20 09:16 in0_label
-r--r--r-- 1 root root 4096 Nov 20 09:16 name
drwxr-xr-x 2 root root    0 Nov 20 09:15 power
-r--r--r-- 1 root root 4096 Nov 20 09:16 power1_average
-rw-r--r-- 1 root root 4096 Nov 20 09:16 power1_cap
-r--r--r-- 1 root root 4096 Nov 25 21:56 power1_cap_default
-r--r--r-- 1 root root 4096 Nov 25 21:56 power1_cap_max
-r--r--r-- 1 root root 4096 Nov 25 21:56 power1_cap_min
-r--r--r-- 1 root root 4096 Nov 20 09:16 power1_label
-rw-r--r-- 1 root root 4096 Nov 25 21:56 pwm1
-rw-r--r-- 1 root root 4096 Nov 25 21:56 pwm1_enable
-r--r--r-- 1 root root 4096 Nov 25 21:56 pwm1_max
-r--r--r-- 1 root root 4096 Nov 25 21:56 pwm1_min
lrwxrwxrwx 1 root root    0 Nov 20 09:15 subsystem -> ../../../../../../../../class/hwmon
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp1_crit
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp1_crit_hyst
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp1_emergency
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp1_input
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp1_label
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp2_crit
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp2_crit_hyst
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp2_emergency
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp2_input
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp2_label
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp3_crit
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp3_crit_hyst
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp3_emergency
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp3_input
-r--r--r-- 1 root root 4096 Nov 20 09:16 temp3_label
-rw-r--r-- 1 root root 4096 Nov 25 21:56 uevent

It's been reported by several people that these old interfaces are known to NOT WORK with the new generation of cards. And we know why: the old interface is being deprecated, and they haven't decided on a new standard. I know, it's frustrating. But, nothing that existed before RDNA3 is going to work for RDNA3. And I'm pretty sure they're not going to expose the fan speed directly, as has been the case for a long time. We will not be able to set the fan speed to a constant percent or RPM. They're not going to rebuild that capability.

I've got an RDNA2 card, 6900XTXH; it is only a generation older than yours. It isn't "OLD". How do you think radeon-profile works? It uses these hwmon files. However that said, my card has no fans, it comes with an EK waterblock from the factory. That said, the original board still triggers the creation of all those fan and PWM control files as you can see above. They're useless though for me with nothing there to control or power.

If all you care about is controlling the fan, you may have to experiment at your own risk. Do be careful though. Perhaps try pwmconfig and fancontrol. If lm-sensors picks up your fan RPMs then there's a chance fancontrol could control it if the files are spelled correctly.

That Arch linux link I gave a few post back also had a link to a GUI for AMD fancontrol written in rust. Just control-f and search for "fan".

Good luck!

All that said, if you want maximum performance for your RDNA3, get a waterblock and liquid cool it! You'll never have to worry about the fan on cards again! Just the custom fans on the radiator. My radiator for the 6900XTXH has two 140mm noctua industrials on it controlled by pwmconfig fancontrol. With standard consumer PWM 120mm/140mm fans, they'll always be easy to control with lm-sensors, pwmconfig, fancontrol and systemd.

Here's my config:

INTERVAL=10
DEVPATH=hwmon2=devices/pci0000:00/0000:00:03.1/0000:0c:00.0/0000:0d:00.0/0000:0e:00.0 hwmon3=devices/pci0000:00/0000:00:18.3 hwmon7=devices/platform/nct6775.656
DEVNAME=hwmon2=amdgpu hwmon3=k10temp hwmon7=nct6798
FCTEMPS=hwmon7/pwm3=hwmon3/temp1_input hwmon7/pwm1=hwmon2/temp1_input hwmon7/pwm2=hwmon2/temp1_input
FCFANS=hwmon7/pwm3=hwmon7/fan3_input hwmon7/pwm1=hwmon7/fan1_input hwmon7/pwm2=hwmon7/fan2_input
MINTEMP=hwmon7/pwm3=20 hwmon7/pwm1=25 hwmon7/pwm2=25
MAXTEMP=hwmon7/pwm3=70 hwmon7/pwm1=60 hwmon7/pwm2=60
MINSTART=hwmon7/pwm3=255 hwmon7/pwm1=255 hwmon7/pwm2=255
MINSTOP=hwmon7/pwm3=254 hwmon7/pwm1=189 hwmon7/pwm2=197
MINPWM=hwmon7/pwm3=254 hwmon7/pwm1=188 hwmon7/pwm2=196
MAXPWM=hwmon7/pwm3=255

That config controls two sets of radiators. One 360 radiator for my CPU and a 280radiator for my GPU; along with a couple of case fans.

Be careful not to manually alter the fan and pwm files if you find them until you are confident on what your are doing. Best way to learn is to understand how lm-sensors work along with pwmconfig and fancontrol.

PorcelainMouse commented 11 months ago

Thanks. I think we are talking past each other. I know about the /dev & /proc filesystems. This problem has been widely discussed. The PWM control is broken on RDNA3. They just didn't program that interface to work on the RDNA3 cards.

See here: https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1392071-rx-7900-series-or-all-rdna3-amdgpu-fan-control-missing

That was 6 months ago. Since then, I heard that kernel 6.7 has fan curve control functional. NOT manual PWM control, like RDNA2 and earlier, but you program the curve in the card and the card follows it for you. I can't find the link right now, but I caught a bit of how to do it and it kinda made sense. Yes, it's poking at the /dev files manually, which is fine with me, I just didn't see any explanation of how to encode the curve. Haven't had time to follow up.

But! That is good news. radeon-profile should be able to implement the new fan-curve interface now that it's available in the upstream kernel. I'm running kernel 6.7rc4 now, so I should be able to poke at it. Just need time. I'd be happy to test for radeon-profile though!!!

PorcelainMouse commented 11 months ago

Ah, yes, I found what I was thinking about! Here is a little bit of detail about how to use the new fan curve settings: https://gitlab.freedesktop.org/drm/amd/-/issues/2402#note_2184713

And, as someone there pointed out, you can implement something like manual control by setting a single fan curve point with a low temp threshold. Make sense. Hmm, that seems like something I could try right now...Argh, sorry, must do work! must stay focused! Maybe during the holiday break. Good news everyone!

BlyatGif commented 10 months ago

And, as someone there pointed out, you can implement something like manual control by setting a single fan curve point with a low temp threshold. Make sense. Hmm, that seems like something I could try right now...Argh, sorry, must do work! must stay focused! Maybe during the holiday break. Good news everyone!

Tried this for the past three days through both Mainline and AMD-DRM-Next kernels, its a great idea and something I've done in the past with other cards however it doesn't seem to do squat, at least not on the 7900 Taichi. You can absolutely mess with the fan curve through Lact, FanControl-gui, even Corectrl (Radeon-Profile fan controls remain greyed out) but they don't seem to do anything. Others had luck setting a curve via echo to /sys/class/drm/card#/device/gpu_od/fan_ctrl/fan_curve (card# being 0 or 1 depending). but mine never took whether it was that way or through Neovim despite having amdgpu.ppfeaturemask=0xffffffff set. It's been an aggravating weekend lol.

acheronte commented 10 months ago

So what's the current state of affairs? I am considering "upgrading" to Kernel 6.7 but need to know beforehand if fan control works on rdna3.

asumagic commented 10 months ago

So what's the current state of affairs? I am considering "upgrading" to Kernel 6.7 but need to know beforehand if fan control works on rdna3.

@acheronte No idea about this repo, but I stumbled upon this issue searching for the new gpu_od directory, which is under /sys/class/drm/cardXXX/device/ as described above.

The new gpu_od/fan_ctrl/fan_curve works for me on a 7900XT on 6.7 (though I didn't test extensively, it did seem to respond correctly).

Bear in mind I am a random user and not an expert. Just make sure you monitor temps after testing. To my knowledge, all you need is:

To my knowledge this is purely runtime. Resetting the GPU or rebooting should:tm: clear it, but I am not 100% sure.

If you get an error you can consult dmesg for the reason (e.g. for my GPU the fan percent must be 23..100).

To clarify, those are not files you can edit through a text editor! They don't behave anything like normal files, the kernel does whatever it wants with reads and writes, same as other sysfs tunables.

Dasfiter-S commented 9 months ago

The solution above worked awesome for me. I decided to write a script for everyone else running arch (worked on garuda, not sure what would change on other flavors) to do this very easily. This script will allow you to SET YOUR OWN FAN CURVE VERY EASILY. Thank you to everyone int he thread for the info. I honestly have been lost trying to do this on my own for the past 4 months. Here is the repo: https://github.com/Dasfiter-S/arch_manual_fan_curve/blob/main/README.md

dvdesolve commented 9 months ago

Have RX 7600 Pulse OC card from Sapphire, experiencing the same issue. Are there any plans to implement fan curve control for RDNA3 arch in this software?

jazzar-dev commented 7 months ago

I am running arch with qtile, my gpu is an rx 7900 xt. I added the amdgpu.ppfeaturemask=0xffffffff to the grub boot config, but I still can't find the /syc/class/drm/card#/device/gpu_od. I don't have the gpu_od directory. Does anyone know if I missed something or how to do this?

asumagic commented 7 months ago

I am running arch with qtile, my gpu is an rx 7900 xt. I added the amdgpu.ppfeaturemask=0xffffffff to the grub boot config, but I still can't find the /syc/class/drm/card#/device/gpu_od. I don't have the gpu_od directory. Does anyone know if I missed something or how to do this?

Works for me. You're on Linux >=6.7, right?

jazzar-dev commented 7 months ago

I am running arch with qtile, my gpu is an rx 7900 xt. I added the amdgpu.ppfeaturemask=0xffffffff to the grub boot config, but I still can't find the /syc/class/drm/card#/device/gpu_od. I don't have the gpu_od directory. Does anyone know if I missed something or how to do this?

Works for me. You're on Linux >=6.7, right?

running 6.6.22-1-lts. Edit: just realized 6.6.22 is not >= 6.7

jazzar-dev commented 7 months ago

The solution above worked awesome for me. I decided to write a script for everyone else running arch (worked on garuda, not sure what would change on other flavors) to do this very easily. This script will allow you to SET YOUR OWN FAN CURVE VERY EASILY. Thank you to everyone int he thread for the info. I honestly have been lost trying to do this on my own for the past 4 months. Here is the repo: https://github.com/Dasfiter-S/arch_manual_fan_curve/blob/main/README.md

My fan is still at 0 rmp even though my temps are high 40s. They do work on windows, and they also worked once on linux, I don't what happened for it to spin but then it never did again.

asumagic commented 7 months ago

The 0rpm fan behavior is unclear to me and it seems possible that it is dealt with independently from the fan curve... For instance, I've noticed that if I set a custom curve, the fan always ramps up then down after a while even though the card and hotspot are way cool enough. I haven't found a way to control this behavior.

jazzar-dev commented 7 months ago

The 0rpm fan behavior is unclear to me and it seems possible that it is dealt with independently from the fan curve... For instance, I've noticed that if I set a custom curve, the fan always ramps up then down after a while even though the card and hotspot are way cool enough. I haven't found a way to control this behavior.

I just ran a gpu stress test, the fan ramps up but the junction temp is already at 90ish c. I'm not sure but I read somewhere that some cards change the fan level based on usage not on temp. I'll try to test this.

dvdesolve commented 7 months ago

The solution above worked awesome for me. I decided to write a script for everyone else running arch (worked on garuda, not sure what would change on other flavors) to do this very easily. This script will allow you to SET YOUR OWN FAN CURVE VERY EASILY. Thank you to everyone int he thread for the info. I honestly have been lost trying to do this on my own for the past 4 months. Here is the repo: https://github.com/Dasfiter-S/arch_manual_fan_curve/blob/main/README.md

My fan is still at 0 rmp even though my temps are high 40s. They do work on windows, and they also worked once on linux, I don't what happened for it to spin but then it never did again.

I've faced the same issue. Behavior is a bit odd, but it works (somehow)

Dasfiter-S commented 7 months ago

The 0rpm fan behavior is unclear to me and it seems possible that it is dealt with independently from the fan curve... For instance, I've noticed that if I set a custom curve, the fan always ramps up then down after a while even though the card and hotspot are way cool enough. I haven't found a way to control this behavior.

Yes, this the incredibly annoying decision of AMD to introduce a "quiet" zone to the fan curves. No idea how to fix that. If you go into windows you will notice that the Adrenaline software will allow you to disable or tweak the quiet zone temperature where the fans do not run until they reach taht threshold. I assume that the quiet zone is the default for the card settings since we all have that issue at 40~something degrees.

Banditman74 commented 5 months ago

Today I was able to run curve adjustment on Fedora 40 Kernel 6.8+ Here's what I did:

amdgpu.ppfeaturemask=0xffffffff

radeon.cik_support=0 amdgpu.cik_support=1

The parameters above need to be added separated by a space in /etc/default/grub It should look something like this:

GRUB_CMDLINE_LINUX="your_paranetrs amdgpu.ppfeaturemask=0xffffffff radeon.cik_support=0 amdgpu.cik_support=1" And we rebuild GRUB to fix the new parameters with the command: sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Next, I set the fan curve using CORECTRL and it worked!!!)))

photo_2024-05-28_15-53-26

acheronte commented 5 months ago

Thanks for reporting in @Banditman74.

My personal solution, and it might not apply to everyone, was to ditch Linux altogether and use Windows + WSL2.0, to have the best of both worlds. AMD Adrenalin works on Windows and allows me to set a custom fan curve with a few clicks, while WSL allows me to do Linux stuff like writing code and terminal commands, without dual booting or VMs. I have since deleted the Ubuntu partition from my computer.

I don't have the time or patience to be tinkering with OS settings as I used to, I have opted for the path of least resistance, so I can focus on doing work on my machine, rather than fight against it.

Banditman74 commented 5 months ago

Thanks for reporting in @Banditman74.

My personal solution, and it might not apply to everyone, was to ditch Linux altogether and use Windows + WSL2.0, to have the best of both worlds. AMD Adrenalin works on Windows and allows me to set a custom fan curve with a few clicks, while WSL allows me to do Linux stuff like writing code and terminal commands, without dual booting or VMs. I have since deleted the Ubuntu partition from my computer.

I don't have the time or patience to be tinkering with OS settings as I used to, I have opted for the past of least resistance, so I can focus on doing work on my machine, rather than fight against it.

For me it was an open gestalt))) I use Linux for work and Windows for entertainment and gaming. But I really wanted to find a way to make cooling work on Linux

dvdesolve commented 5 months ago

Today I was able to run curve adjustment on Fedora 40 Kernel 6.8+ Here's what I did:

amdgpu.ppfeaturemask=0xffffffff

radeon.cik_support=0 amdgpu.cik_support=1

The parameters above need to be added separated by a space in /etc/default/grub It should look something like this:

GRUB_CMDLINE_LINUX="your_paranetrs amdgpu.ppfeaturemask=0xffffffff radeon.cik_support=0 amdgpu.cik_support=1" And we rebuild GRUB to fix the new parameters with the command: sudo grub2-mkconfig -o /boot/grub2/grub.cfg

I haven't used cik_support params, only ppfeaturemask. My fan curve script looks as following:

#!/usr/bin/bash

FC_PATH=/sys/class/drm/card1/device/gpu_od/fan_ctrl/fan_curve

if [ ! -f "${FC_PATH}" ]; then
    echo "fan_curve file not found"
    exit 1
fi

# prepare fan curve
echo '0 40 25'  > "${FC_PATH}"
echo '1 50 35'  > "${FC_PATH}"
echo '2 60 55'  > "${FC_PATH}"
echo '3 70 75'  > "${FC_PATH}"
echo '4 85 100' > "${FC_PATH}"

# commit
echo 'c' > "${FC_PATH}"

exit 0
crieo commented 3 months ago

Unfortunately at least for me the fan curve simply gets ignored. I get that below a 50 degrees the fans won't spin no matter what I do but i was at least expecting that beyond that threshold i can setup a fan curve. Of course I added ppfeaturemask to grub but still, it's not working.

image

Kernel: 6.10.0-3-MANJARO Mesa: OpenGL version string: 4.6 (Compatibility Profile) Mesa 24.1.3-manjaro1.1

also tried with various Kernels 6.6 - 6.9, all with the same behaviour.

EDIT: I'm using a 7600 XT, not the 7900 but I think this might apply to all RDNA3 cards

asumagic commented 3 months ago

I feel like the 0rpm behavior might be independent from the fan curve. Does it work if you e.g. have 100% fan speed set and get a high enough temp?

Also, I am not 100% sure, but I think the fan curve is governed by the hotspot temperature, not the edge temp. I don't think I've reliably seen the threshold to be 50°C on either the hotspot or edge, but it might be OEM-dependent.

Banditman74 commented 3 months ago

I feel like the 0rpm behavior might be independent from the fan curve. Does it work if you e.g. have 100% fan speed set and get a high enough temp?

Also, I am not 100% sure, but I think the fan curve is governed by the hotspot temperature, not the edge temp. I don't think I've reliably seen the threshold to be 50°C on either the hotspot or edge, but it might be OEM-dependent.

The fan kicks in at about 65-67 degrees at hotspot, not at core

crieo commented 3 months ago

I feel like the 0rpm behavior might be independent from the fan curve. Does it work if you e.g. have 100% fan speed set and get a high enough temp?

Yes, i both tried 35% fixed as well as 100% fixed while running FurMark 2. Either way, the builtin curve remains applied Leasing to a steady Fan Speed increase while running...

asumagic commented 3 months ago

Can you try using the commandline way mentioned earlier in the thread? Do you get any error on one of the echo commands? Normally, when applying fails for a reason or another with the new sysfs API, it gets logged in dmesg as well, with a reason why.

crieo commented 3 months ago

First of all thanks for replying that fast and pointing me to these interesting logs!

Just tried it and while the first "echo" (25% fan speed) led to an error (in terminal: Invalid argument, in dmesg: pwm fan curve setting (25) must be within [35,100]), changing the line to 35 led to no errors. Nevertheless the fan curve gets completely ignored, see attached screenshot (blue is the fan, you can see its speed increasing constantly opposite to the "fixed 35%" that i set it up). So still, IT's Not working...

image

PorcelainMouse commented 3 months ago

Interesting. I still don't have any fan_ctrl path element anywhere below /sys/class/drm/. I have kernel 6.9 now and I would expect these interfaces to show up without fancy kernel boot options if they were really ready for prime-time. But, who knows. Glad you got it working. Maybe I'll have time to try this stuff some day, soon.

asumagic commented 3 months ago

First of all thanks for replying that fast and pointing me to these interesting logs!

Just tried it and while the first "echo" (25% fan speed) led to an error (in terminal: Invalid argument, in dmesg: pwm fan curve setting (25) must be within [35,100]), changing the line to 35 led to no errors. Nevertheless the fan curve gets completely ignored, see attached screenshot (blue is the fan, you can see its speed increasing constantly opposite to the "fixed 35%" that i set it up). So still, IT's Not working...

image

I am not really sure, but do make sure you're checking the hotspot temp. I feel like the one you're plotting is the edge temp, which is sometimes significantly lower, in a very workload-dependent way. I don't really recall if corectrl allows you to plot that but sensors would show it.

asumagic commented 3 months ago

Interesting. I still don't have any fan_ctrl path element anywhere below /sys/class/drm/. I have kernel 6.9 now and I would expect these interfaces to show up without fancy kernel boot options if they were really ready for prime-time. But, who knows. Glad you got it working. Maybe I'll have time to try this stuff some day, soon.

It's not surprising to me that stuff like fan curves, power limits, etc. are gated behind a kernel commandline flag. I think this was the case for the prior interfaces as well.

crieo commented 3 months ago

First of all thanks for replying that fast and pointing me to these interesting logs! Just tried it and while the first "echo" (25% fan speed) led to an error (in terminal: Invalid argument, in dmesg: pwm fan curve setting (25) must be within [35,100]), changing the line to 35 led to no errors. Nevertheless the fan curve gets completely ignored, see attached screenshot (blue is the fan, you can see its speed increasing constantly opposite to the "fixed 35%" that i set it up). So still, IT's Not working... image

I am not really sure, but do make sure you're checking the hotspot temp. I feel like the one you're plotting is the edge temp, which is sometimes significantly lower, in a very workload-dependent way. I don't really recall if corectrl allows you to plot that but sensors would show it.

Hi asu,

i'm baffled - you're right. I could only check for memory and junction temperature in corectrl but nevertheless they are excellent indicators for the curve behaviour. Long story short I was reading the wrong sensor. Everythings works as expected on my side! I apologize.

The only remaining issue that I have is that when in idle or only light browsing, the gpu activity spikes enormously, but no monitoring tool I found could give me a hint where this comes from... image but this is ot :D!