microsoft / terminal

The new Windows Terminal and the original Windows console host, all in the same place!
MIT License
94.85k stars 8.21k forks source link

Weird slowness scolling text in the console #143

Closed powercode closed 5 years ago

powercode commented 6 years ago

When outputting text from PowerShell on a Dell XPS 15, the rendering speed is amazingly slow.

I have not come across this on any other machine.

Attached a short windows performance recording of this. conhost.exe (1608) and pwsh.exe (19432) are the relevant processes.

STAFFANX15.03-30-2018.11-25-22.zip

Hopefully someone more well versed in this domain can help figure out what is going on.

IISResetMe commented 6 years ago

It's caused by conhost.exe excessively hammering the registry to query the following two values:

HKCU\Software\Microsoft\Windows\CurrentVersion\ImmersiveShell\TabletMode
HKCU\Software\Microsoft\Windows\DWM\ColorPrevalence

Having previously reported this issue through Feedback Hub (to no avail), let me offer my observations from debugging this issue:

An easy way of show the resulting difference in speed is to compare to anything piped to Out-String:

$WindowHeight = $Host.UI.RawUI.WindowSize.Height
Clear-Host
$StopWatch = [Diagnostics.Stopwatch]::StartNew()
&{
    1..($WindowHeight * 2)|ForEach-Object{
        Write-Output "test"
    }
}
$MultiString = $StopWatch.Elapsed
Clear-Host
$StopWatch = [Diagnostics.Stopwatch]::StartNew()
&{
    1..($WindowHeight * 2)|ForEach-Object{
        Write-Output "test"
    }
} | Out-String
$SingleString = $StopWatch.Elapsed

Even though we add overhead from Out-String, and the number of lines that the console host eventually need to write to the screen buffer are exactly the same, you'll find that the $MultiString measurement is significantly larger that $SingleString, presumably because the former caused the console host to scroll $WindowsHeight where as the latter only had to cause a single scroll.

zadjii-msft commented 6 years ago

So I've definitely noticed this from time to time on my own laptop, which is an HP something or other - though it's definitely not something I've found to be consistently reproducible.

I also know that conhost isn't the one doing this - at least not directly. From the sounds of it, DWM is getting involved during our paint and slowing us down, but I wouldn't have the faintest clue how to start debugging that.

@bitcrazed anyone on the DWM, or composition, or something team that we can forward this issue to to have them take a look?

powercode commented 6 years ago

image

It is not obviously DWM.

powercode commented 6 years ago

It is correlated to font size. Almost grinds to a halt with font size set to 36.

IISResetMe commented 6 years ago

Interestingly enough, I updated my Win10 Pro 1709 Friday night, and now (build 10.0.16299.309) conhost.exe is no longer querying the registry keys mentioned above, but DWM queries this value:

HKCU\Software\Microsoft\Windows\DWM\ColorPrevalence

10 times per scroll. Still digging

powercode commented 6 years ago

fontdrvhost.exe shows up in the traces on machines where this is slow. On machines where this isn't slow, fontdrvhost.exe uses no CPU at all.

And it seems related to scrolling.

powercode commented 6 years ago

image

There is some ping-pong between conhost.exe and fontdrvhost.exe.

powercode commented 6 years ago

conhost_fontdrvhost.zip

A less noisy ETW trace.

powercode commented 6 years ago

The machines where you have seen it being slow, did they all have touch screens?

See the IncDevice!vector scalar destructor in the call stack. That doesn't show up on my machines where this is a lot faster.

bitcrazed commented 6 years ago

Thanks all. Have reached out to the DWM team. Let's see what they say.

powercode commented 6 years ago

@bitcrazed Any news?

bitcrazed commented 6 years ago

We are looking into this. Sorry for the delay - Build prep is eating up a lot of people x hours right now ;)

bitcrazed commented 6 years ago

@powercode - Okay, we have some suspicions as to what's happening here (thanks CD for your help with this 😀)

Could we ask:

  1. What is the size & resolution of your screen? 15.6" @ 3200 x 1800?
  2. What DPI scaling factor are you using?
  3. What is the font face & size of your affected Console(s)?
  4. Does this problem disappear if you decrease your Console font size to ~12pt when running at 200% DPI?

BTW - MANY thanks for capturing traces - they've been ENORMOUSLY helpful!

powercode commented 6 years ago

Resolution 3840x2160. Scaling 250. Both the recommended setup.

I have tried different fonts but have not stumbled upon any that worked, but have not on the other hand made exhaustive tests. Consolas is affected, as is Deja Vu Sans Mono for PowerLine, and several other PowerLine fonts.

Wow! Setting the resolution scaling to 200% improves perf a lot.

Duration         CommandLine
--------         -----------
00:00:15.4034821 ls -recurse   # 250%, 36 pt
00:00:03.1567607 ls -recurse   # 200%, 36 pt
00:00:00.2825821 ls -recurse   # 200%, 12 pt
powercode commented 6 years ago

And @bitcrazed, thank you, and Microsoft, for making it so easy to both gather and look at perf recordings. It is a game changer for me!

bitcrazed commented 6 years ago

Ahhhh! Yeah, you're getting hit by the known issue! Anyone else seeing this, you'll likely see perf return to normal if you keep scaling <= 200% and your font <24pt., though not guarantees.

I found the dev owner yesterday and he was literally working on fix for this issue that surfaced elsewhere but is the same root cause!

Re. XPerf/WPA - I KNOW, RIGHT!! I was talking with aforementioned dev and was recalling how hard it was, back in the dark ages, to diagnose issues like these without the benefit of detailed perf traces!!

Thank goodness for progress ;)

powercode commented 6 years ago

Yes! Thank goodness for progress, and @randomascii, for his incredibly helpful blog posts on analysing the data. Learned so much from him! Huge props!

randomascii commented 6 years ago

I'm always glad to hear when my blog posts are useful.

powercode commented 6 years ago

Any update?

bitcrazed commented 6 years ago

Are you still seeing this issue on recent Insider builds?

powercode commented 6 years ago

That machine is not on insider builds :(

bitcrazed commented 6 years ago

Just checked with the engineer who worked on this: The specific issue you're seeing has been mitigated which should result in you no longer seeing it's effect. A more comprehensive fix is on the backlog and will be triaged into a future release.

To test / confirm, you'll need to install a recent RS5 Insider build, or wait until RS5 ships later this year. Once you do get onto RS5, please update this thread with your findings and close this issue if it's resolved.

Many thanks again for filing and your help in diagnosing this issue.

powercode commented 6 years ago

Installed RS5 - Issue still present.

11 seconds to dir a folder with fondsize 36. 1.8 on size 14. Both horribly slow.

powercode commented 6 years ago

STAFFANX15.08-12-2018.21-38-27.etl.7z.zip

Uploaded trace. It's a 7z archive with an added zip extension to allow the upload.

bitcrazed commented 6 years ago

Thanks. Will take a look.

bitcrazed commented 6 years ago

Hey @Powercode - any chance you could share repro steps - we'd like to see, trace, and measure the perf issue you're seeing.

powercode commented 6 years ago

Start powershell. Set font size to 36. Run ls.

Still on a computer with high resolution (3840x2160) and scaling on 200%.

bitcrazed commented 6 years ago

Hey @powercode. I've been doing some repro'ing on my SP4 with an external 4K 28" screen:

Font @ 18pt, Console @ 81x25 chars Font @ 36pt, Console @ 81x25 chars
Command: dir C:\Windows\System32\ -recurse Command: dir C:\Windows\System32\ -recurse
Start: 12:50:21.47 Start: 12:52:04.63
End: 12:50:23.52 End: 12:52:09.44
Elapsed: 00:00:02.05 Elapsed: 00:00:04.81

FWIW, I used my timing script to measure the elapsed time to execute and render the recursive listing command above.

With the font set at 36pt, there is indeed a marked slow-down, but remember:

  1. Console currently uses GDI to draw text which uses your CPU to render text glyphs
  2. GDI generates and caches glyphs, and then BLITs them onto the screen
  3. The larger the glyph, the fewer can be cached
  4. At 18pt, the client area is 1053x675. At 36pt, the client area is 2025x1350 and the resulting glyphs are 4x bigger ... that's a lot of LARGE bitmaps to BLIT
  5. And since scrolling often results in significant text changes between iterations of the render loop, the entire client area can end up being "dirtied", resulting in the full Console client area having to be fully re-rendered each time! 2025x1350 x 4bytes each pixel == 10MB per frame x 60 fps == 656MB/s we have to force from your CPU to the GPU's frame buffer every second!.

That's a lot of data by anyone's measure, and it makes one's machine do a lot of work:

recursive-dir

Now, can we go faster? HELLS YEAH! 😜

The GDI team recently improved their glyph caching mechanism, with more improvements planned for future releases, but increased the Glyph Cache buffer to mitigate the above factors until those improvements can be made.

Also, Console plans on replacing our GDI renderer to DirectWrite at some point in the future which should eliminate this issue anyhow.

Also we have some other tricks up our sleeves that we hope will noticeably improve text rendering perf, esp. while scrolling. Bear with us ;)

I'll leave this issue open for now, and we'll update it when we have any solid perf improvements to share.

mixmastamyk commented 5 years ago

Hi, understood that there is a lot of graphics to push around on 4k. However, what about the issue mentioned above with querying the registry 10x per scroll? Might be an easy fix to cache that value:

HKCU\Software\Microsoft\Windows\DWM\ColorPrevalence
levicki commented 5 years ago

Hello, I would like to point out that poor console performance in Windows 10 is still present in 1809.

Resolution is 1920x1080, scaling is 100%, console font size is 16pt Consolas.

In my case, running IDA Pro interactive disassembler (idaw.exe) in the console results in a totally absurd situation where conhost.exe is using more CPU than the program itself:

image

Looking at the threads of conhost.exe the culprits are obvious: image

The only part of the console window being updated is highlighted: image

Why is drawing 8 characters at a fixed position using 10% of the 3.2 GHz quad-core CPU, and what is more important, why is this slowing down the console program so much that something which took minutes in Windows XP takes hours in Windows 10?

People were drawing full screen graphics on 1 MHz Motorola 6502 CPU 3 decades ago without dedicated graphics hardware, this is how far modern programmers have fallen.

warrenrumak commented 5 years ago

People were drawing full screen graphics on 1 MHz Motorola 6502 CPU 3 decades ago without dedicated graphics hardware, this is how far modern programmers have fallen.

If you're talking about the Commodore VIC 20 or Commodore 64, they did have dedicated graphics hardware chips, such as the MOS 6567. They were also drawing at something like 160x200 resolution, with 4-bit colour, using an 8x8 font with no anti-aliasing, no back-buffering, no virtualization of memory, no consideration for security, no concept of "processes" or "threads", no constant stream of interrupts coming from connected devices, no API layer or "drivers" to abstract away the hardware, with no ability to protect against or recover from crashes....

I mean, it's really not comparable at all.

Also keep in mind that the number of cores your computer has isn't pertinent to how quickly a single thread of operation will run, especially in the case you're presenting here. Drawing the console isn't multi-threaded. 😊

levicki commented 5 years ago

@warrenrumak

I happen to know which chips were used in computers of that era. Most they could do was simple stuff, scanline tricks, scrolling and hardware sprites. The first real blitter co-processor with programmable minterms which could do arbitrary composition of raster images using DMA that I am aware of in personal computers was in Commodore Amiga.

Commodore 64 to which I was referring had 320x200 pixels screen resolution while the area being redrawn in my example is 162x18 pixels (and I am being generous here by selecting wider area than the actual 8 hex-digit counter):

image

Commodore's MC 6502 also had interrupts to service (vblank, keyboard, joysticks, floppy, sound, etc) and it somehow managed to offer decent single-task experience.

On the other hand, you are telling me that (mostly idle) i7-6700K running at 4 GHz with 32 GB of (mostly unused) ultra-low latency DDR4 3200 MHz RAM, 3.2 GB/sec NVMe SSD and watercooled NVIDIA 1080 Ti clocked at almost 2 GHz is supposed to struggle with (a single-task of) redrawing of 162x18 pixels because of?

Security? Interrupts? High-level software abstractions? You might have as well mentioned Santa Claus stuck in the chimney because that would sound equally ridiculous.

TL;DR -- There are plenty of choices for hardware acceleration of drawing on a modern PC under supposedly modern OS like Windows 10. There are plenty of performance analyzers (Intel's VTune just to name one) and to me it seems none of those methods and tools were actually used to produce the actual code. Please next time put the blame where it belongs.

oising commented 5 years ago

Alright gentlemen, enough with the pissing contest :) I think the answer is clear: @bitcrazed explained above that the glyph rendering is being done by GDI/GDI+, an ancient CPU-bound renderer with its roots in the original NT. Yes, there was hardware acceleration added during the win7/2008r2 era, but clearly this isn't enough anymore. The technological successors to GDI and GDI+ are Direct2D and DirectWrite (mentioned above), which also arrived with win7/2008r2. Whenever this renderer upgrade will happen, who knows...

levicki commented 5 years ago

@oising

Are you suggesting that some other, faster and more modern renderer was used in Windows XP, Windows 7, and Windows 8.1?

Because in those OSes everything is still working as intended (i.e. the console application is not being slowed down more than 25 times).

Is Microsoft's official answer to reporting a serious performance regression in a subsystem used every day by developers and admins who are managing and maintaining their user base really "yeah we are aware of it but whether we will fix it and when, who knows"?

That's a rhetorical question, don't bother.

warrenrumak commented 5 years ago

Are you suggesting that some other, faster and more modern renderer was used in Windows XP, Windows 7, and Windows 8.1?

Because in those OSes everything is still working as intended (i.e. the console application is not being slowed down more than 25 times).

These versions of Windows also have numerous bugs and missing features with their console implementation. Can't go full-screen, can't display the full Unicode character set, can't resize while retaining line-wraps, can't use the full range of VT-100 codes, can't remote the console to another machine..... the list goes on and on.

Rich covered a bunch of this in a blog post earlier this year. If you haven't read it yet, you should: https://blogs.msdn.microsoft.com/commandline/2018/07/20/windows-command-line-inside-the-windows-console/

I understand that it's fun to be snotty on the Internet, Igor... but I think you'll find it's a lot easier to understand an observed performance regression in Windows 10's console when you contextualize it within the vast amount of dev work the team is putting in to modernize this part of Windows.

mixmastamyk commented 5 years ago

Believe these problems and complaints are caused by the development strategy chosen for this project. (Almost) everyone loves the new work you've done, but no one wants to deal with new bugs or performance regressions in a component that is entrenched and hasn't been touched in twenty-ish years (until recently).

With some hindsight it's probably better to revert console back to legacy and start fresh on a modern DirectX/Unicode/Fonts first, Posix PTY model, replacement. This is likely possible without throwing out much work. For example the bridges built to run console apps over ssh, ConPTY, and ANSI parsing should continue to be used, perhaps in a different context.

This strategy should make more folks happy. End users that won't have to deal with new bugs in the old stuff, as well as developers who won't have to step around eggshells in the new stuff. Trying to do both in a single code-base makes things harder for everyone.

warrenrumak commented 5 years ago

With some hindsight it's probably better to revert console back to legacy and start fresh on a modern DirectX/Unicode/Fonts first, Posix PTY model, replacement. This is likely possible without throwing out much work. For example the bridges built to run console apps over ssh, ConPTY, and ANSI parsing should continue to be used, perhaps in a different context.

This strategy should make more folks happy. End users that won't have to deal with new bugs in the old stuff, as well as developers who won't have to step around eggshells in the new stuff. Trying to do both in a single code-base makes things harder for everyone.

If you need the pre-Windows 10 console for compatibility reasons, choose it in Properties.

image

zadjii-msft commented 5 years ago

I know it always seems like a fine and dandy idea to say "hey we're going to break back-compat for thousands of existing apps so we can add some new features for people" but we like to hold ourselves to a higher standard than that. We've found a path where we can do good for all of the existing users of the console codebase, and slowly introduce new features and modernize, without leaving customers behind.

Sure, we could have deprecated the old console and parked the console subsystem and implemented a new PTY subsystem. That's a path we could have taken, but that would have left everyone with an existing console application asking "why did Microsoft deprecate their entire console API with no plan to modernize those existing apps? Are they fools?"

We're just now in a place where the console subsystem can support conpty, where it can support other rendering heads, and where the backing buffer could possibly support emoji and other unicode codepoints. It might not be the path that makes you the most happy, but it's the path that makes the most developers the most happy.

We're working on a new rendering head. No commitments as to when it'll be finished, but we're working on it. When it's done, then that will likely resolve the performance issues described in this thread. I haven't personally debugged into this issue, but my gut agrees with Rich's analysis that the issue is coming from GDI. I certainly wouldn't call this a major perf regression, and I doubt it's due to any of our code in particular. It's likely just due to a really old CPU-bound graphics API that's not optimized for running on new hardware.

As @warrenrumak mentioned, if you're unhappy with how the console is performing, go ahead and switch back to legacy mode.

mixmastamyk commented 5 years ago

console application asking "why did Microsoft deprecate their entire console API with no plan to modernize those existing apps? Are they fools?"

I don't believe so. Folks want their twenty year old stuff to work without regressions. Others would like new modern features without restrictions. Neither is asking specifically for a Frankenstein hybrid that risks the former and limits the latter. Sure, the latter group will accept it, when there's no other option. But, that's not saying much.

ANSICON (and colorama etc.) already exist and are free. So why not leave the past in the past?

warrenrumak commented 5 years ago

ANSICON (and colorama etc.) already exist and are free. So why not leave the past in the past?

They have. Don't struggle with this, Mike. Again, if your mythical "twenty year old stuff" doesn't work with the new Windows 10 console, by all means, use the old console, link in ANSICON, and keep relying on monkeypatching GetProcAddress (euughhhh).

mixmastamyk commented 5 years ago

@warrenrumak, obviously they haven't or this bug wouldn't exist.

DHowett-MSFT commented 5 years ago

The console is a very polarizing Windows component--everybody has an opinion on how we should run the team, where we should make our investments, and what we should put on our roadmap. We can't do everything, but we also can't do nothing and remain technologically relevant in 2018. The v1/legacy console is representative of this polarization. Please bear with us as we try to improve the state of the art for everybody.

As always, thanks for reporting this. We'll definitely be triaging this with the rest of our bug fixes. In the meantime, however, please keep the discourse civil.