Closed wdscxsj closed 6 months ago
Hi I'm an AI powered bot that finds similar issues based off the issue title.
Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you!
Note: You can give me feedback by thumbs upping or thumbs downing this comment.
Can you please install our nightly ("canary") release? You can find it here: https://aka.ms/terminal-canary-installer
Afterwards please take the following steps:
If the memory usage drops after the last step and only after the last step, we can already be extremely certain that it's due to your graphics driver.
However, we can debug it further if you'd like. There are two ways to do so:
Thanks for your detailed response! I've tried again with Terminal Canary, and the result is roughly the same as before. With 10 tabs open:
I also suspect it's due to the graphics driver. This Intel Arc graphics card is not yet recognized by the latest GPU-Z...
A full memory dump would be around 100 GB. So I run VMMap as admin, and this is a screenshot of the Total memory:
None of the Private Data regions shows a thread ID (otherwise I would notice that yesterday). After 1 hour of waiting and some activities in Terminal Canary, a refreshed view shows each 65,536 KB Private Data regions still has 1 Read/Write.
The top ASLR Image is igc64.dll (65.7 MB of file size) from the graphics driver, the Intel Graphics Shader Compiler for Intel(R) Graphics Accelerator. It's also loaded by dwm, IGCC, Chrome, VSCode, etc.
I guess it's better to stay with WARP until an updated driver brings good luck, right?
A full memory dump would be around 100 GB.
I only meant a dump of WindowsTerminal.exe. It should only be ~1028MB as you noted. However, I sort of realized that this is not needed anyways. There are better ways to investigate the issue...
If you have Windows Performance Recorder (WPR) installed, you can
It can then be debugged in the Windows Performance Analyzer (WPA). You can find the latter in the app store, which I believe should also install the former. In any case, this will net us something like this:
It would tell us exactly where it's coming from. I probably don't have access to the symbols for Intel's drivers, but I know people who do, so I could send it to them. If you want to do such a WPR trace, I'd be happy to check it out!
None of the Private Data regions shows a thread ID (otherwise I would notice that yesterday). After 1 hour of waiting and some activities in Terminal Canary, a refreshed view shows each 65,536 KB Private Data regions still has 1 Read/Write.
Any allocation via VirtualAlloc
is labeled with "Thread Environment Block" for whatever reason. Only those with an ID next to them are actual TEBs and refer to stack memory. Since your allocations don't have an ID, they must be VirtualAlloc
calls with a 64MiB size. If I had to take a guess, I suspect that Intel's driver is using arena/linear allocators and forgot that you're supposed to MEM_RESERVE
the address space and only then gradually MEM_COMMIT
it. π
However, it's very suspicious that only we're affected and no one else. One thing you could try instead of using WARP is to set the "Graphics API" to "Direct2D" (and with WARP disabled).
Thank you very much! I've learned a lot again. The download link for a WPR trace file with a full memory dump has been sent to your email.
To explain how WPA is commonly used... When you open it, it'll look something like this:
Each tab can contain an arbitrary number of panes. When you click on the graph types on the left, new panes will be added to the current tab. As such, I usually first close all tabs and then open the graph that I want. In this case we want the "Virtual Allocations" graph which is in the "Others" section on the left. This will list all the processes that were recorded:
In the table view, everything to the left of the vertical yellow line are columns which group data and everything to right of the yellow line is an aggregate. It's a little bit like working with a database. Basically, everything to the left "maps" / "groups" and everything to right "reduces" / "sums".
Columns in WPA are special however: They can have complex rules and configurations to customize everything to your liking. If you're interested in this, click on the wheel icon at the top of the pane (next to the red "3" marker).
Here you can do a couple things, which I've marked with the red numbers:
To get function names you have to load symbols. Unfortunately, even if you use a "Filter" it'll load symbols for all applications by default. This takes a long time. So what you can do is add a filter for symbol loading:
At the end it'll look something like this:
For some reason my WinDbg can't search any heaps anymore, but I need that because otherwise I can't find the addresses of the AtlasEngine instances in the memory. So, I'll have to unfortunately respond later when it comes to the dump.
However, given the stack trace in the WPR I believe it's likely that the driver allocates a 64MiB ring buffer for uploading Direct3D resources that have D3D11_CPU_ACCESS_WRITE
.
In any case, I believe this may be another indication why we need #15186 much more urgently than it may seem.
Thanks a lot for your help and detailed explanation. Now I have my stack view and flame graph, too!
The laptop is using the latest graphics driver from the OEM (Lenovo), but there is a newer version from Intel released on March 27. After a public holiday leave of 3 days, I can try my luck again with an update.
I'm glad to report that this issue is solved by the latest Intel graphics driver (31.0.101.5382), updated from the OEM-provided driver 31.0.101.5008. It works in both the latest release (1.21.921.0) of Terminal and the Canary.
On the same laptop, Canary with 10 tabs uses about 380 MB (Automatic or Direct3D 11). Software rendering uses a much lower 100 MB, but it's quite acceptable. Roughly the same numbers for the latest release with the AtlasEngine.
Frankly I didn't expect the new driver to work so well, since its release note doesn't mention a word about this issue. There are multiple releases in between, so it must be one of them that comes up with the fix.
Here is how a 10-tab Canary process now looks in VMMap. The 64 MB regions of 1 Read/Write are gone, replaced by roughly 32 MB for each tab with some activities.
Huge thanks to @lhecker again. Your help and guidance are truly invaluable!
Thank you so much for following up! We'll close this and keep it around for anybody that asks this questions. π
Windows Terminal version
1.19.10821.0
Windows build number
10.0.22631.3296
Other Software
No response
Steps to reproduce
Expected Behavior
The hardware rendering option should use much less memory.
Actual Behavior
Already reported. I suspect this issue may be hardware related. It happens on a new ThinkBook 16 Gen 6+ laptop, with an Intel Core Ultra 7 CPU and Intel Arc graphics card. The driver is up to date (v31.0.101.5008). It's not affected by which shell is opened.
On the same machine, no other programs (e.g. Chrome and VSCode) have this issue. It doesn't happens on other machines I've tested, with exactly the same Windows Terminal configuration.
In VMMap, it can be observed that the Private Data takes 9/10 of the committed memory. It contains multiple regions of 65,536 KB marked as Thread Environment Block.