microsoft / Windows-Dev-Performance

A repo for developers on Windows to file issues that impede their productivity, efficiency, and efficacy
MIT License
434 stars 20 forks source link

Throttling detection needed in Windows #101

Open randomascii opened 2 years ago

randomascii commented 2 years ago

Windows Build Number

Win32NT 10.0.19042.0 Microsoft Windows NT 10.0.19042.0

Processor Architecture

AMD64

Memory

32 GB

Storage Type, free / capacity

C: SSD 44/254 GB

Relevant apps installed

Windows Performance Toolkit

Traces collected via Feedback Hub

Sorry, no traces.

Isssue description

Some computers, especially but not exclusively laptops, may throttle their CPUs due to excessive heat or insufficient power or other reasons. This throttling can be significant, in some cases the CPU may run at 25% of its nominal rate or lower even though the system is under heavy load and is CPU bound. This throttling makes the computer unresponsive or sluggish.

Users who encounter this are unlikely to understand the cause and are likely to blame whatever software they are running that causes the heavy load. This was documented in 2013 here: https://randomascii.wordpress.com/2013/08/06/defective-heat-sinks-causing-garbage-gaming/

In many of the cases referenced above the consumers were adamant that only one game had the performance problem and therefore the game was at fault, even though the problem was defective cooling.

Consumers need a way to tell when their CPU is being throttled, if only some sort of indicator on Task Manager's CPU metrics. Also, developers need some way of detecting this. It is possible to record the CPU's current speed but on newer version of Windows PROCESSOR_POWER_INFORMATION.CurrentMhz no longer records the "requested" CPU speed so there is no way to tell if the CPU is running slowly due to throttling or due to low load.

Steps to reproduce

Run a CPU intensive workload on a Lenovo P51 when running on battery (power throttling) or a machine with a defective heat-sink (thermal throttling) and note that the CPU speed drops with no indication to the user.

Expected Behavior

There should be an indication of thermal throttling to the user, and some sort of ETW events to make it clear to ETW trace analysts.

Actual Behavior

There is no indication to the user that there is a problem, and there is no indication in ETW traces that the poor CPU performance is due to throttling.

AvriMSFT commented 2 years ago

Hey @randomascii thanks for submitting this issue and apologies for the delayed response! I've reached out to some folks on the issue and will get back to you with updates once I have them. Thanks for your patience!

Eli-Black-Work commented 1 year ago

Having some sort of visual indicator in Process Manager that indicates CPU throttling is a great idea 🙂

randomascii commented 1 year ago

This issue has recently become particularly personally interesting for me because my work laptop has started intermittently throttling. I don't know if it is thermal throttling, power throttling, or something else. I suspect thermal throttling, perhaps due to thermal paste or some other aspect of the laptop degrading after ~four years of use.

Diagnosing this was tricky. I initially noticed intermittent sluggishness, then I went to Task Manager which showed the CPU was only ~27% busy. At first that seemed odd that the computer would feel so sluggish when 73% of the CPU time was available and I thought maybe the sluggishness was elsewhere in the system. But then I noticed that the CPU was running at ~0.8 GHz. Then I remembered that Task Manager shows percentage busy as a percentage of the nominally available CPU cycles, so in fact the CPU was 100% busy at 0.8 GHz and the sluggishness was from the low speed.

Which is to say, it took rather too much math (I had to multiply the 27% busy amount by the norminal frequency divided by the actual frequency in order to tell that the CPU was in fact fully occupied) and too much guessing ("I guess it's thermal throttling?") and I know that the CPU records its temperature and records when it hits thermal limits.

So, please show the throttling reasons. And please consider changing how Task Manager displays CPU usage. It rarely makes any difference during normal operation, but when throttling happens it is extremely confusing.

stolk commented 1 year ago

Ugh... I will have to inform my customers, that TurboLEDz no longer works under Windows. Disappointing.

The OS itself is still able to retrieve core frequencies, though, as you can see them in the Task-Manager.

I bet this was done to stop an exploit: if you can measure how busy cores are, you can get a modicum of information on secrets.

Frankly, as this call req'd administrator rights already, microsoft should have left this one alone.

jwatte commented 11 months ago

It's fine, you can just throw in a bogomips loop of 100,000 dry iterations before each vblank in your game, and time how long it takes using HPET or something. You'll be able to detect down-clocked CPUs that way. Yes, that's a terrible idea. Yes, that's how it will be done, as long as the OS lies to us.