microsoft / Windows-Dev-Performance

A repo for developers on Windows to file issues that impede their productivity, efficiency, and efficacy
MIT License
434 stars 20 forks source link

NT heap scales horrendously in some cases #106

Open Donpedro13 opened 2 years ago

Donpedro13 commented 2 years ago

Windows Build Number

Win32NT 10.0.22000.0 Microsoft Windows NT 10.0.22000.0

Processor Architecture

AMD64

Memory

64 GB

Storage Type, free / capacity

SSD 80/512 GB

Relevant apps installed

-

Traces collected via Feedback Hub

We collected profile traces with both Visual Studio and VTune, we can provide these via a private channel, if needed.

Isssue description

Recently, my current employer started measuring the multithreaded performance of a commercial application. We were interested in both raw performance numbers and scalability in terms of CPU core count. We were surprised to see that some operations scale terribly: the durations actually increase with the core count (contrary to the usual case). Profiling revealed that in most of the problematic cases the biggest bottleneck was the NT heap, due to its scalability problems. We measured with other heaps as well (Intel's TBB, and the Segment Heap to name a few), and none of them suffered from the same phenomenon.

Here's a chart plotting some of our measurements, lower is better (Y-axis: the time it takes to perform a certain operation in seconds X-axis: number of CPUs):

image

Here's a second chart that compares the NT heap and Segment Heap, a value below 100% means that the Segment Heap performed better (Y-axis: Segment heap/NT heap relative time as percentage X-axis: number of CPUs):

image

I'm aware that this is a bit too vague, I can provide the whole dataset through a private channel if required.

We opened a PSfD support case (can provide the case number, if needed), as we believed that we might be hitting some pathological path in the NT heap implementation that should be fixed on Microsoft's side. We were basically told, that:

That's all well and good, we wouldn't mind switching to the Segment Heap, per se. However, there are many cases where the Segment Heap has worse performance than the "classic" one. I've included every data point of every measurement we did on the chart below. Relative performance, a value above 100% means that the Segment Heap performed worse:

image

Trading in some performance (about 10% on average) in many cases for scalability in others does not seem like a very good deal.

Is this expected? We would prefer to stay on a heap that's part of the operating system (either the "classic" or Segment heap), but these are the kind of trade-offs that make it not worth it.

Steps to reproduce

No easy repro (the phenomenon in question was reproduced in a commercial application that requires a license and some setup/installation steps).

Expected Behavior

The NT heap scales at an acceptable level, or the Segment Heap performs at least as good as the "classic" NT heap in every case.

Actual Behavior

The NT heap scales horrendously in some cases. The segment heap scales well but has worse performance in many cases.

AvriMSFT commented 2 years ago

Hey! Thanks for reporting and giving such detailed descriptions of the issue🙂. I'm working on routing this issue to the right team and will report back soon.

Eli-Black-Work commented 1 year ago

@AvriMSFT Were you able to route this issue to the right team?