microsoft / terminal

The new Windows Terminal and the original Windows console host, all in the same place!
MIT License
94.86k stars 8.22k forks source link

WT should start up fast: profile the startup path and trim anything that takes a while #5907

Open ghost opened 4 years ago

ghost commented 4 years ago

Steps to reproduce

  1. Click to launch Windows Terminal

Expected behavior

Windows Terminal should be ready instantaneously like windows console, or like Sublime Text. Windows Terminal can't be slower than windows console.

While Windows Terminal is fast compared with other tools like Visual Studio, or iTunes, it is still not fast enough for a Terminal application.

Actual behavior

It takes too long to startup. It is not ready instantaneously. It's not as fast as windows console.


maintainer note: hijacking OP for task list:

zadjii-msft commented 4 years ago

I mean, the Terminal is doing a lot more than the console ever was. I'm not sure there's much more we can do to optimize our UI setup. Conhost was using basically the simplest Win32/GDI interface possible, and the Terminal needs to stand up a XAML stack. Even if we somehow had a server process that already had the settings pre-loaded, we'd still need to stand up the UI stack.

At least the Terminal is faster at processing output than the console ever was, and opening new tabs/panes is certainly faster than opening a new conhost is.

Maybe there's something we can do here to optimize the creation of the XAML stack.


7/21/2022 edit: putting this here so it doesn't ping everyone on this thread.

While investigating another issue:

image

ghost commented 4 years ago

and the Terminal needs to stand up a XAML stack

XAML stack means UWP?

mdtauk commented 4 years ago

and the Terminal needs to stand up a XAML stack

XAML stack means UWP?

This uses a Xaml Island for now, but with WinUI 3.0 Xaml will not be tied to UWP, and can be used with Win32 code

zadjii-msft commented 4 years ago

To be technically correct - the Terminal is a Win32 application that's using Xaml Islands to host UWP XAML content in it's window, and is (typically) run as a packaged application.

The lines between what constitutes a "UWP" and a "Win32" application are becoming more and more blurred every day, and the Terminal is a great example of a hybrid application that can utilize both technologies.

AnuthaDev commented 4 years ago

@zadjii-msft Would it be possible to eliminate the XAML stack entirely from the equation? Using a combination of directx/directcomposition/win32 technologies to create the UI. I mean, terminal really doesn't "require" XAML, most of its UI is pretty straightforward, and should be pretty easy to do in c++. The most important part I believe is the TermControl that is needed for a minimum viable product. Have you investigated this scenario/ is there interest in it. I would love to help to make this happen. But I understand that would require significant resources, and currently, optimizing XAML is the best bet.

zadjii-msft commented 4 years ago

I mean, that's a possibility, sure, but I think as the Terminal UI gets more elaborate, using DComp for the entire UI is going to be less and less feasible. Plus, if we do want 3rd party developers writing extensions to provide their own UI elements for the Terminal (see #4000), it'd probably be more developer-friendly to ask them to write XAML components rather than DComp visuals

ghost commented 4 years ago

To be technically correct - the Terminal is a Win32 application that's using Xaml Islands to host UWP XAML content in it's window, and is (typically) run as a packaged application.

The lines between what constitutes a "UWP" and a "Win32" application are becoming more and more blurred every day, and the Terminal is a great example of a hybrid application that can utilize both technologies.

It looks and feels like a UWP app though. Maybe that's the problem.

mdtauk commented 4 years ago

@zadjii-msft Would it be possible to eliminate the XAML stack entirely from the equation? Using a combination of directx/directcomposition/win32 technologies to create the UI. I mean, terminal really doesn't "require" XAML, most of its UI is pretty straightforward, and should be pretty easy to do in c++. The most important part I believe is the TermControl that is needed for a minimum viable product. Have you investigated this scenario/ is there interest in it. I would love to help to make this happen. But I understand that would require significant resources, and currently, optimizing XAML is the best bet.

Part of the idea for this project is to modernise the terminal UI and feature set, supporting multiple console types, and show the best of Windows.

Choosing not to implement the modern UI stack, is not within that scope. Windows 10X's shell has moved to XAML, and XAML is being decoupled from the OS and being open sourced, so the community can push it forward, without relying on new OS updates.

By the end of the year, all the code should be in the GitHub, and then the community can explore how to improve performance.

The app is using C++ and so is XAML. Being able to move out of the use of an Island will possibly bring with it some perf benefits by default

mdtauk commented 4 years ago

To be technically correct - the Terminal is a Win32 application that's using Xaml Islands to host UWP XAML content in it's window, and is (typically) run as a packaged application. The lines between what constitutes a "UWP" and a "Win32" application are becoming more and more blurred every day, and the Terminal is a great example of a hybrid application that can utilize both technologies.

It looks and feels like a UWP app though. Maybe that's the problem.

It is the direction Windows is moving in, so its not like it looks like UWP, but Windows is moving towards that UI everywhere.

ghost commented 4 years ago

It is the direction Windows is moving in, so its not like it looks like UWP, but Windows is moving towards that UI everywhere.

Well, I can't really do anything about that other than to give feedback as I am doing, and avoiding Windows updates and eventually moving out of the Windows platform.

AnuthaDev commented 4 years ago

I mean, that's a possibility

So you're telling me there's a chance 🤣😅

I hope winUI 3.0 alleviates this situation.

DHowett-MSFT commented 4 years ago

@AnuthaDev it’s nowhere near ready for primetime use (by developers who we can’t go bother when stuff goes wrong), but this repository does produce a WPF control that’s really just a standard Win32 HWND with the terminal surface on it. It’s pretty much the DirectWrite renderer wired up to a surface.

Our long term plan sees us producing composable controls other developers can integrate into their own experiences.

If only we had all the time in the world :smile:

AnuthaDev commented 4 years ago

If only we had all the time in the world :smile:

@DHowett-MSFT Okay, serious question. Suppose somebody else does the entire work, what would your preference be XAML or win32 HWNDS (Provided that win32 significantly reduces memory consumpition and startup time)?

mdtauk commented 4 years ago

WinUI Desktop will use HWNDs for it's window implementation, but will use XAML UI (which uses DirectX to render to the screen)

oising commented 4 years ago

Well, I can't really do anything about that other than to give feedback as I am doing, and avoiding Windows updates and eventually moving out of the Windows platform.

@phgmacedo Why would you avoid future Windows updates? WinUI 3.0 brings many benefits, and as a dev, you're never going to be forced to update WinUI either. Opting out of Windows updates seems to be counter-productive in general - or are you just making an orthogonal, abstruse gesture about your displeasure? It may be more effective to visit the new WinUI 3.0 site and assuage your worries.

AnuthaDev commented 4 years ago

WinUI Desktop will use HWNDs for it's window implementation, but will use XAML UI (which uses DirectX to render to the screen)

Ah, there's a difference between HWNDs and HWND. in XAML there is only parent HWND window, while in classic win32 every control has a HWND

mdtauk commented 4 years ago

WinUI Desktop will use HWNDs for it's window implementation, but will use XAML UI (which uses DirectX to render to the screen)

Ah, there's a difference between HWNDs and HWND. in XAML there is only parent HWND window, while in classic win32 every control has a HWND

Yea, that is a very classic windows concept. Every control is also a Window.

@oising I think he is maybe expressing a preference for the Win32 visual look, compared to the Windows 10/WinUI look. Could be the density of controls, or just an old school familiarity with WinForms / WPF.

DHowett-MSFT commented 4 years ago

density

So, WinUI 2 started to offer a "Compact" sizing dictionary that changes control sizes to more closely match classic Win32. That might be worth investigating.

DHowett-MSFT commented 4 years ago

After testing: on account of we don't have too much UI right now it doesn't really help us.

mdtauk commented 4 years ago

Density keeps to the 32px min height for touch controls like buttons and text boxes I believe. The flyouts from the Add Tab button could be affected, but until WinUI 3, flyouts are not affected by setting a compact density.

DHowett-MSFT commented 4 years ago

Now, I'm going to mark and minimize all the complaining about our choice in UI framework as off-topic and turn this into the issue for "make sure terminal launches fast". Kay? Kay.

vannomad commented 3 years ago

Would running WT in tray be an option? It would not necessarily improve "startup" but it would make it unnoticeable. Would also fix quake mode not running unless there's a WT window open.

zadjii-msft commented 3 years ago

Sure, that's more of a request we're working on over in #9996.

rushfan000 commented 3 years ago

@vannomad It's well known that windows terminal is extremely slow.

I've been using wezterm for a while. It does not mantain a tray icon, and it is lightning fast, starts up instantly! And even then people complain that wezterm is slow It most definitely is not slow compared to windows terminal.

ssylvan commented 3 years ago

One thing I didn't see mentioned: Terminal currently starts up slow enough that if you hit enter and start typing, you will not only lose key-presses, you will lose focus, meaning that terminal will never get any input until you click the window.

I.e. you type terminal in the start menu, hit enter, then type "dir" or whatever, that first 'd' will happen long before terminal is actually active, so it gives focus to something else. Where does that keypress go? The "next" window? I'm not sure! Wherever it goes, it activates that window, which then prevents Terminal from getting focus once it gets around to actually launching. And at that point, Terminal will just be an inactive window that doesn't get any input until you take some manual action to give it focus.

So, at a bare minimum: Terminal needs to start up and immediately take focus, then buffer all the input keys so they can be fed to the command line. This needs to happen fast enough that I don't lose focus.

That's the bare minimum, but really in 2021 you shouldn't be able to say "one Mississippi" before a simple app like Terminal is not only launched and activated, but fully initialized. ​That's an eternity. Really, terminal should be up and running and fully initialized before the key-up event on the enter key that launched it. I realize this is a hard problem because it's very likely the problem is in the UWP/XAML stack. Indeed, calculator suffers from the exact same problem these days, where the new version of the app loses input because it is just sooooo slow to launch. So I get that this may not be something you can entirely fix on your own, but you can add a requirement to the XAML folks to improve their startup performance (as well as do whatever you can do on your own side to mitigate the problem).

Re: the suggestion above about removing XAML, I don't think it's as crazy as it may seem. Indeed, I would argue that taking a XAML dependency in Terminal (and Calculator) before verifying that its performance was acceptable was a mistake. Backing out of that mistake until XAML achieves acceptable performance for core apps like Terminal and Calculator is not at all unreasonable IMO. It really doesn't seem like you actually need XAML for the core window/terminal area at all (maybe keep it for settings and initialize it asynchronously?). It's obviously preferable if XAML can be improved, but if it doesn't improve, the priority should be to meet your users' needs (even if that means not using some newfangled UI frameworks that aren't yet up to par).

DHowett commented 3 years ago

One thing I didn't see mentioned: Terminal currently starts up slow enough that if you hit enter and start typing, you will not only lose key-presses, you will lose focus, meaning that terminal will never get any input until you click the window.

Yeah, I'm none-too-pleased about that. We're booking launch time perf into 1.11 thanks to this and other discussions (and regressions :|)

ssylvan commented 3 years ago

Just did a quick trace with some selected events: image

So while XAML is indeed taking up a fair mount of time, it doesn't seem like it's responsible for most of it. For example, do we really need to wait until >1s in before we launch the processes for conhost and cmd.exe? Or could we launch them right away (in parallel with all the other processes that launch, and in parallel with XAML initialization)? And why is this ScriptedSandbox64 launched so late compared to the rest of the processes (no idea what that is).

Note that nothing really happens until 240ms in, so I guess that's the sort of "floor" of how long windows takes to launch anything (and maybe launching via VS is an issue here too).

Looking at the UI thread: image

Again we see a big gap before anything happens at all on the UI thread, but then there's another big gap in the center before we reach the steady state rendering at the end. That middle gap seems to be mostly about the setup that happens in the render thread initialization (part of this is a slow initial render I think?): image

Perhaps the render thread could be initialized at the very beginning of the process launch rather than waiting for XAML (and conhost/cmd.exe etc.) to finish first? So that it's (hopefully) ready to go by the time it's needed? Seems like it currently needs the swapchain to initialize, so I'd recommend trying to refactor that so you can get most things initialized (esp. DirectX, DirectWrite, etc.), and then resize at the end and wire it up to XAML.

vadimkantorov commented 2 years ago

One of my takeaways from https://github.com/microsoft/terminal/issues/6409 / https://github.com/microsoft/microsoft-ui-xaml/issues/2648 was that XAML stack also loads a lot of useless libraries (even including Maps control) and probably this requires at least some useless accesses to hard disk

bozhodimitrov commented 2 years ago

I just want to add feedback for this particular issue:

Is that the case for anyone else or it's just me?

zadjii-msft commented 2 years ago

Idk what is going on

Ultimately, that's what's important- knowing what is going on here. Measurements of how long it takes on various CPUs isn't helpful. Actual traces of the startup, which can identify the bits of startup that are taking the longest, that's what's actually important to this thread. I've got some traces higher up that we're starting with. We'll start there, unless someone can narrow down some part of XAML init that's more costly that we can trim out.

bozhodimitrov commented 2 years ago

I assume that there is something related to Xaml, because while searching on google/github/reddit for this issue, I saw this strange behavior happening at the start bar for several seconds, before the proper WT icon and title gets in place:

image

Versus

image

So I assume that it might be related. But sadly Idk how to debug further. I will try to search for additional instructions on how to run a debug build of Windows Terminal or running some kind of profiling.

Btw, I removed the Azure profile, since I don't need it for now, I have only the default CMD and PowerShell profiles. I even added the -noLogo argument for PowerShell in order to avoid the greeting news message that PowerShell provides by default.

PS: I assume that most users use the suspended version of Windows Terminal, which is always running/sleeping in the background and this is why most of them don't notice the cold startup time.

lhecker commented 2 years ago

It's probably a good idea to save this here just in case we need it later... (WPR trace of starting 10 instances of Windows Terminal Preview 1.16.2142.0.)

image

Edit: The JumpList and cross process COM parts have now been fixed in #13692, which is part of version 1.16.

oising commented 2 years ago

I vote to have jumplist generation opt-in via settings. Seems like a quick win. Who uses jumplists with terminals that much? Is there telemetry on this?

zadjii-msft commented 2 years ago

jumplist generation opt-in via settings

Yea I'm not gonna do that. #576 was one of the biggest feature requests. Definitely not gonna disable that by default. A more sensible fix might be to wait until after TerminalPage::_CompleteInitialization to kick off the first UpdateJumplist. I know we're requesting it on a background thread, but I guess presumably, that just creates a thread that could ask for time slices before we've got the window on the screen.


Another optimization we discussed at least on Teams - stashing the initial size in state.json somewhere. Any time we hot-reload the settings, kick a BG thread to evaluate how big the window should be for the initialRows/Cols and the default profile's[1] font settings. On launch, we could avoid the cost of looking up the font to do that math most of the time, just use the dip we already precalculated. That would cause the initial size to be wrong in the case that someone edited the settings.json with the Terminal closed, sure. But that seems like it'd save startup cost most of the time, so it'd be a viable optimization

[1]: heck typing this up, I had better ideas. We could cache each profile's startup size, sure. OR we could just encode into json the DIP for various fontFace/size pairs. Like, { "Cascadia Code": {"12": "8,13"}, ...}. And then regardless of the profile launched, look up the DIP from that precalculated cache. That would only be wrong if the font itself changed, and the cache would only miss again, when the settings are changed w/o the Terminal open.

DHowett commented 1 year ago

Discussion idea:

vadimkantorov commented 1 year ago

Would it be possible to also measure / provide publicly counters of hard page faults / L3 cache misses during startup / amount of committed RAM / amount of accessed RAM / amount of disk accessed including loading of shared XAML libraries (this is especially important for systems with slow disk) of Terminal startup (at Windows startup and at a later stage)?

A such helper harness should also probably be a good Windows perf programming example :)

A bit fantasizing, but some of the above should also be correlated with progress

zadjii-msft commented 1 year ago

That's a discussion we're already tracking over in #6409. I'm gonna collapse these two as off topic - feel free to continue the conversation over in that thread.

vadimkantorov commented 1 year ago

Well, that discussion finalized at a suggestion of me going debugging Terminal with XAML team. And I have no burning need in that because I upgraded my laptop to SSD one, so I no longer have access to that slow system and Terminal is working okay for me now, so it's not problematic enough to endeavor this kind of perf debugging on my own.

When I found this umbrella issue, I thought that the discussion in that issue is relevant within scope of this issue as well and brought up those memory-related stuffs again. But if these stuffs are irrelevant here, I'm fully okay with off-topicing-this, no probs.

vadimkantorov commented 1 year ago

In https://github.com/microsoft/terminal/issues/15001 a related usecase: Windows OS running some autoruns do quick cmd.exe some_script.cmd that do not print anything and do not require user input. This spins up many Terminal instances and it's quite slow. The special thing about this usecase is that if the script completes quick, it was worthless of doing full XAML loading and such. So if full rendering can be delayed and then skipped completely because of exit in 500ms, it'd be a big win.

lhecker commented 1 year ago

With the recent barrage of improvements, I've made a new perf trace today. This time using Nvidia Nsight Systems, because it has a neat way to represent delays (zoom in as needed):

trace

Of the remaining \~400ms ~320ms launch cost of Windows Terminal, about 240ms (60% 75%) are due to WinUI and XAML. There are some things we can do about that, but it'll be very difficult, because WinUI isn't exactly easy to manipulate into being lean. For instance, the C++ XAML generator has a bug, where it doesn't emit metadata for system types into our metadata cache. When WinUI then starts, it tries to look up those system types, can't find them and will look around in all registered user providers. This causes Microsoft.Terminal.Settings.Model.dll and everything else to be loaded, which takes ~10% (maybe more). Preventing this isn't easily possible, because creating 2 metadata caches (one for WT and one for the settings model) isn't documented and probably not supported. Most of the time is spent in the layout and rendering code^1 though and that's an area we can't improve upon.

Another 80ms (20%) are caused by our workaround for #11648, a bug which still isn't fixed in Windows unfortunately. I've been thinking about just adding a setting that will load the nearby fonts if the setting is enabled. I don't think caching the font size is a good idea because that would only improve launch time by 10% instead of the expected 20%. It would worsen the user experience for some but improve it for others. This is what I'd like to fix asap. It's the easiest improvement we can make at this point. Fixed.

The remaining 80ms (20%) will be difficult to fix. For instance, HWND creation costs 20ms and we need at least 2 (main thread + 1 for each window). Setting up the Monarch COM server and negotiating that costs another 5-10ms, by nature of COM setup being slow. We allow loading fragments via the app extension catalog, which is an LPC and extremely slow (10ms for returning an empty list). That's almost the entire cost already.

vadimkantorov commented 1 year ago

As Terminal users, we can upvote certain issues/bugs (that are preventing Terminal from being faster) on XAML github if they have github repo and you link these bugs :) same for any linked Feedback items about general appx slowness which you consider slowing Terminal.

Also, earlier I noted that my Terminal loads a ton of unrelated DLLs, including something related to Maps controls.

eduarddejong commented 1 year ago

If I may add anything functional here. I can imagine that a possible solution would be to differentiate between 2 different ways of starting the Terminal:

  1. Opening the application by explicitly clicking it as a user, whether that is via a shortcut or the right click menu in a folder.
  2. Automatic launches by several console application executions when being configured as the default terminal in Windows, which would otherwise launch the old conhost.exe console window of windows.

When launching the first way, I love features, and startup time does not matter that much.

When launching the second way, the application should be absolutely blazingly fast by really not loading any more resources than it needs. Because these launches can happen a lot of time after each other. For example when running any kind of automated batch operation.

I don't know if this is possible, but it might be helpful as an idea.