microsoft / Windows-Dev-Performance

A repo for developers on Windows to file issues that impede their productivity, efficiency, and efficacy
MIT License
439 stars 21 forks source link

PrefetchVirtualMemory does not prefetch virtual memory (windows api bug) #108

Open no-j opened 2 years ago

no-j commented 2 years ago

Windows Build Number

10.0.19044.0

Processor Architecture

AMD64

Memory

16GB

Storage Type, free / capacity

SSD 512GB

Relevant apps installed

Windows

Traces collected via Feedback Hub

N/A

Isssue description

I tested PrefetchVirtualMemory to minimize page faults when creating a file mapping, and measured the performance (compiler ticks using __rdtsc() intrinsic) and the actual count of page faults (using GetProcessMemoryInfo) when processing files, with and without a call to PrefetchVirtualMemory.

To my surprise, PrefetchVirtualMemory does not work as advertise and does not do it's only job, which was to actually fetch the memory from disk as efficiently as possible to minimize page faults.

I used the following process for testing, loading a range of file sizes (I'll share my results for a ~500MB file): I opened a file handle, created a memory map and a view of the whole file, then I processed the whole file making sure to touch every page of the allocated view. Then I repeated the process, and called PrefetchVirtualMemory on the view before processing the file. I got the exact same page fault count overall in both cases, and to add insult to injury, the performance was slightly worse with PrefetchVirtualMemory, making the whole thing useless as it is its only job.

Steps to reproduce

Running the following code on windows:

#include <psapi.h> // for GetProcessMemoryInfo

char process_file(const char *file_name, bool use_prefetch_virtual_memory) {
    HANDLE file_handle;
    file_handle = CreateFileA(file_name, GENERIC_READ, FILE_SHARE_READ,
                              0, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, 0);

    if(file_handle == INVALID_HANDLE_VALUE) {
        fprintf(stderr, "Could not open the file '%s'\n", file_name);
        return 0;
    }

    LARGE_INTEGER file_size;
    if(!GetFileSizeEx(file_handle, &file_size)) {
        CloseHandle(file_handle);
        fprintf(stderr, "Could not open the file '%s'\n", file_name);
        return 0;
    }

    size_t size = file_size.QuadPart;

    HANDLE mapping_handle = CreateFileMapping(file_handle, NULL, PAGE_READONLY | SEC_COMMIT, 0, 0, NULL);

    const char *view = (const char *)MapViewOfFile(mapping_handle, FILE_MAP_READ, 0, 0, 0);

    PROCESS_MEMORY_COUNTERS counter0, counter1, counter2;
    GetProcessMemoryInfo(GetCurrentProcess(), &counter0, sizeof(counter0));

    WIN32_MEMORY_RANGE_ENTRY memory_range_entry[] = {(void *)view, size};
    if(use_prefetch_virtual_memory) PrefetchVirtualMemory(GetCurrentProcess(), 1, memory_range_entry, 0);

    GetProcessMemoryInfo(GetCurrentProcess(), &counter1, sizeof(counter1));

    char result = 0; // I'm using "result" here only so that the optimizer won't skip this loop
    for(int i = 0; i < size; ++i) {
        result += view[i];
    }

    GetProcessMemoryInfo(GetCurrentProcess(), &counter2, sizeof(counter2));

    DWORD prefetch_fault_count = counter1.PageFaultCount - counter0.PageFaultCount;
    DWORD manual_fault_count   = counter2.PageFaultCount - counter1.PageFaultCount;

    printf("Using PrefetchVirtualMemory: %s\n", use_prefetch_virtual_memory ? "yes" : "no");

    printf("Page faults (during prefetch):        %d\n", prefetch_fault_count);
    printf("Page faults (during file processing): %d\n", manual_fault_count);
    printf("Page faults (overall):                %d\n", prefetch_fault_count + manual_fault_count);

    UnmapViewOfFile(view);
    CloseHandle(mapping_handle);
    CloseHandle(file_handle);

    return result;
}

Expected Behavior

Using PrefetchVirtualMemory: no
Page faults (during prefetch):        0
Page faults (during file processing): 161027
Page faults (overall):                161027
Using PrefetchVirtualMemory: yes
Page faults (during prefetch):        314
Page faults (during file processing): **MUCH LOWER** than 160713
Page faults (overall):                **MUCH LOWER** than 161027

Actual Behavior

Using PrefetchVirtualMemory: no
Page faults (during prefetch):        0
Page faults (during file processing): 161027
Page faults (overall):                161027
Using PrefetchVirtualMemory: yes
Page faults (during prefetch):        314
Page faults (during file processing): 160713
Page faults (overall):                161027
AlexeiScherbakov commented 10 months ago

I'm also wondering how exactly to use this API function and what it actually does

We need to find a way to load the file into memory as quickly as possible