microsoft / Windows-Dev-Performance

A repo for developers on Windows to file issues that impede their productivity, efficiency, and efficacy
MIT License
434 stars 20 forks source link

Unexpected Sleep(1) Precision with Different Clock Interrupt Frequencies (Timer Resolution) #115

Open ghost opened 7 months ago

ghost commented 7 months ago

Windows Build Number

10.0.19045.0

Processor Architecture

AMD64

Memory

2x8 GB DDR4

Storage Type, free / capacity

NVME SSD 138 GB/ 512 GB

Relevant apps installed

N/A.

Traces collected via Feedback Hub

N/A.

Isssue description

Hello,

I am yet to discover that this issue is reported elsewhere. I am developing an application that requires sub-millisecond sleep precision upon calling functions such as Sleep(1). To do so, I am raising the clock interrupt frequency by calling NtSetTimerResolution. I decided to benchmark the precision scaling of Sleep(1) by using QueryPerformanceCounter against raising the Timer Resolution to determine a good balance between power efficiency and precision however, I encountered unexpected results.

#include <iomanip>
#include <iostream>
#include <windows.h>
#include <tlhelp32.h>
#include <vector>

extern "C" NTSYSAPI NTSTATUS NTAPI NtSetTimerResolution(ULONG DesiredResolution, BOOLEAN SetResolution, PULONG CurrentResolution);

int main() {
    // benchmark 0.5ms - 1ms Timer Resolution
    double begin = 0.5;
    double stop = 1;
    double increment = 0.002;
    int samples = 20;

    ULONG minimum_resolution, current_resolution;
    LARGE_INTEGER start, end, freq;

    QueryPerformanceFrequency(&freq);

    std::cout << "RequestedResolutionMs,DeltaMs\n";

    for (double resolution = begin; resolution <= stop; resolution += increment) {
        NtSetTimerResolution(resolution * 10000, true, &current_resolution);

        // get an average result for 20 Sleep(1) benchmarks for each resolution
        std::vector<double> sleep_delays;

        for (int i = 0; i <= samples; i++) {
            // benchmark Sleep(1)
            QueryPerformanceCounter(&start);
            Sleep(1);
            QueryPerformanceCounter(&end);

            double delta_s = (double)(end.QuadPart - start.QuadPart) / freq.QuadPart;
            double delta_ms = delta_s * 1000;
            double delta_from_sleep = delta_ms - 1;

            sleep_delays.push_back(delta_from_sleep);
        }

        size_t size = sleep_delays.size();

        double sum = 0.0;
        for (double delay : sleep_delays) {
            sum += delay;
        }

        double average = sum / size;
        std::cout << resolution << "," << average << "\n";
    }
}

The program above outputs the results of the Sleep(1) delta from 1ms with different clock interrupt intervals in a CSV format. As shown in the graph below, there is a directly proportional relationship between the clock interrupt frequency and Sleep(1) precision as I would expect, but the sleep precision with 0.5ms Timer Resolution seems to be worse than a slightly lower resolution such as 0.506ms. In fact, 0.5ms resolution is providing the same precision as ~0.745ms which is a lower resolution. How does this make sense? I asked a group of people online to repeat my benchmark in which, the behaviour was reproducible on several machines (30+).

image

Why is this the case? Is there anything that can be done to resolve this phenomenon? Several developers out there query the maximum resolution using NtQueryTimerResolution then set the Timer Resolution according to that (0.5ms) are missing out on precision due to this. It is almost as if there is a slight offset that results in higher precision (0.5ms + 0.006ms in my case). The behaviour outlined in this issue is of course not expected in the sense that a higher resolution results in the same sleep precision as a lower resolution.

Steps to reproduce

  1. Compile the program and configure ntdll.lib as a linker dependency
  2. Close any external programs that may be requesting a resolution higher than 1ms
  3. Run the program and either copy + paste the results into results.txt or redirect stdout
  4. Visit https://chart-studio.plotly.com/create and click Import -> Upload -> results.txt

Expected Behavior

0.5ms Timer Resolution resulting in high Sleep(1) precision and outperforms lower Timer Resolutions such as 0.506ms.

Actual Behavior

0.5ms Timer Resolution resulting in low Sleep(1) precision and underperforms compared to a lower Timer Resolution such as 0.506ms.

calculusteacher commented 7 months ago

After following the steps outlined in the instructions, I can confirm there seems to be a bug that a lower Timer Resolution results in higher precision compared to 0.5ms Timer Resolution on my machine

image

noshrimps commented 7 months ago

I can reproduce this behaviour. A resolution of 0.506ms provides higher precision of sleep(1) than a resolution of 0.5ms.

timer

Strangely I found that using BCDEdit /set to change useplatformtick to yes seems to affect the precision at a resolution of 0.5ms on one machine, however useplatformtick has no effect on another machine.

timertick

ghost commented 7 months ago

Interesting results on Windows 7 SP1.

image

AdamBraden commented 1 month ago

The NtSetResolutionTimer api is undocumented and behavior is unsupported. I’ve discussed this issue onto the appropriate team as a new feature request for a high resolution timer. I recommend you enter a Feedback Hub issue under Developer Platform->API Feedback and share the link here so others can easily find it and vote it. I can then connect it to the internal feature request. Like all feature requests, I cannot guarantee if or when it will be worked on, but every vote helps.

JonathahWasTaken commented 1 month ago

For all gamers, music producers or everyone requiring low latency and precision. Please vote for the feedback Hub issue 🙏

https://aka.ms/AAqwulg

feedback-hub:?contextid=950&feedbackid=b477e9ab-b9b7-431b-a393-9cf77718d034&form=1&src=1