horshack-dpreview / setPL

Set PL1 and PL2 power limits for modern Intel processors running on Linux
GNU General Public License v3.0
75 stars 12 forks source link

Setting PL time in MSR #9

Open Petrusion opened 1 month ago

Petrusion commented 1 month ago

Would it be difficult to add an option to set PL time in MSR? It would greatly help my use case.

Setting PL time to some large value (hours) in MSR is the only way my CPU can get above 45W in the long-term. It is a notebook so I can't just increase the TDP even though it can withstand 90W without thermal throttling. I already tried setting the MMIO time but it gets ignored in favour of time set in MSR.

Unfortunately I can only get good TDP in Windows because there throttlestop sets MSR time, leaving MMIO time on the default value.

horshack-dpreview commented 4 weeks ago

Sure, I think so. Please give me a few days. It's been a while since I've worked on this code and I need to find an Intel notebook to test the change with.

Petrusion commented 4 weeks ago

Awesome, thank you so much! If you need me to try something out on my machine (11980HK) don't hesitate to contact me.

horshack-dpreview commented 3 weeks ago

I found where the PL time can be set using the same /sys tree where my script currently sets the PL watts. Before I consider how to integrate this into the script could you first try setting the values manually and confirm you get the expected increased in perf under Linux? I'm not able to perturb the performance on the Intel laptop I'm using, so I want to make sure the value is being honored, even though it should since I see the value I echo to /sys being encoded into the MSR.

Here's a sample command to set both PL time windows to 4 hours:

echo "14400000000" | tee /sys/class/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us /sys/class/powercap/intel-rapl/intel-rapl\:0/constraint_1_time_window_us

And the command to view the currently set time:

cat /sys/class/powercap/intel-rapl/intel-rapl\:0/constraint_[0-1]_time_window_us

Petrusion commented 3 weeks ago

This doesn't work unfortunately. It goes back to 45W after a minute. I might very well be wrong about this, but the way I understand it I'm using MMIO by writing into these files, not MSR.

All I know is in Windows Throttlestop claims to set the time in MSR, leaving MMIO time at its default.

even though it should since I see the value I echo to /sys being encoded into the MSR

How do I check if the time is encoded into the MSR on my machine after settings it via MMIO?

horshack-dpreview commented 3 weeks ago

On my system changing those time windows does get reflected in MSR. You can verify this is by running setPL before making the change and again after. You'll see the value reflected in the MSR_PKG_POWER_LIMIT value displayed.

By default setPL sets the MMIO register to all zeros and locks it, which should disable it, leaving only the MSR for the processor to use, which can then be changed any number of times during the session via setPL. There's a remote chance your system may require the MMIO's time window to be set as well, but I can't really see how that could be since setPL disables that register.

Petrusion commented 3 weeks ago

I see. In that case it is working correctly but somehow the CPU ignores the new values even though they are in MSR before and after my testing. Weird that it happens in Linux but not in Windows.. any idea what it could be?

horshack-dpreview commented 3 weeks ago

How are you determining the CPU isn't reaching 45W under Linux? Based on what's reported by an app like turbostat? Or indirectly based on observed performance?

Petrusion commented 3 weeks ago

It is reaching 45W, the problem is it won't use more than 45W for more than a minute. The processor's tdp is 45 but it is set too aggressively low by the manufacturer because the cooling can handle even 80-90, so I wanna solve that by being permanently in PL1/2 under load.

I'm using MangoHud to see the power draw, which reads it from /sys/class/powercap/intel-rapl\:0/energy_uj.

horshack-dpreview commented 3 weeks ago

Can you try changing "F_DISABLE_MMIO_PL1_PL2" to $FALSE and see if that affects it. This will set the MMIO PL values to the same as the MSR rather than disabling the MMIO PL values.

Petrusion commented 3 weeks ago

Can you try changing "F_DISABLE_MMIO_PL1_PL2" to $FALSE and see if that affects it.

It doesn't help.

horshack-dpreview commented 3 weeks ago

I should have mentioned you need to reboot to try the change to F_DISABLE_MMIO_PL1_PL2 because the lock my script puts on the MMIO reigster prevents any changes to it for the duration of that power-on session, so if you ran setPL for that same session before changing F_DISABLE_MMIO_PL1_PL2 then it wouldn't apply.

Petrusion commented 3 weeks ago

Yes, I figured as much. I rebooted before attempting it.

horshack-dpreview commented 3 weeks ago

Can you verify that PL1/PL2 is still set to your desired value at the moment MangoHud reports power consumption drops back to 45W? You can use this to see the PL1/PL2 values:

turbostat sleep 0 2>&1 | grep MSR_PKG_POWER_LIMIT -A 2

Petrusion commented 3 weeks ago

Yes, turbostat sleep 0 2>&1 | grep MSR_PKG_POWER_LIMIT -A 2 reports 14336.000000 sec and correct Watts even after the CPU gets limited to 45W. Both with F_DISABLE_MMIO_PL1_PL2 as $FALSE and $TRUE

horshack-dpreview commented 3 weeks ago

Hmm, at this point I'm thinking the processor is dropping down due to the throttling events that are designed to override PL1/PL2. Have you tried monitoring those events to see which if any are occurring? There are tools under Windows to monitor this but I'm not sure what's available under Linux.

Petrusion commented 3 weeks ago

I have a hypothesis, maybe setPL works after all... I just noticed that temperature of both CPU and GPU doesn't ever seem to go above 80°C. There must be something in linux trying to keep both of them under 80°C, which is extremely annoying since I never set up anything like that. This CPU is designed to thermal throttle at 100°C and the GPU at 86°C, so of course it won't run above 45W when there is some evil piece of code or whatever blocking it at 80.

I'm trying to do something about it but I'm out of my depth here. Do you have any ideas what it might be?

horshack-dpreview commented 3 weeks ago

I'm not familiar with any potential logic that would enforce a software-based throttle below the processor's normal threshold but a quick google search reveals there are such system components. I would check the system/kernel logs first and see if there is any message related to throttling.