microsoft / terminal

The new Windows Terminal and the original Windows console host, all in the same place!
MIT License
95.66k stars 8.32k forks source link

ConPty somethimes hangs when calling `ClosePseudoConsole`, (pseudo console created with `PSUEDOCONSOLE_INHERIT_CURSOR`) #17688

Closed FrancescElies closed 2 months ago

FrancescElies commented 3 months ago

Other Software

wezterm 20240807-131622-ee063330

I am opening an issue here because as I understood conpty code lives here in this repo

Steps to reproduce

Sadly we couldn't manage to get consistent way to reproduce this.

We see wezterm in windows sometimes hanging when closing panes, e.g. https://github.com/wez/wezterm/issues/5882

I am suspecting this might be related to https://github.com/microsoft/terminal/issues/1810

this comment points out to a race condition, when ConPTY is being started with CreatePseudoConsole() with the flag PSEUDOCONSOLE_INHERIT_CURSOR (which wezterm does) this kind of fit with the fact we can't consistenly reproduce this issue.

Here the start call in wezterm. image

Actual Behavior

ClosePseudoConsole hangs

lhecker commented 3 months ago

Regarding that PSEUDOCONSOLE_INHERIT_CURSOR comment:

In #17510 I added a timeout to cursor inheritance so in the worst case it will only block for 3 seconds. I was about to repurpose this issue to track the remaining problem: A shutdown signal should kill ConPTY instantly and not only after 3 seconds (hence the edits on your message). But then I realized that I also fixed that, so I think this has actually fundamentally been fixed in that PR.

However, then I also realized that this can't cause the freeze on shutdown. I've replaced ConPTY with a build from main in my local wezterm and it still happened. This indicates that the freeze occurs due to a different issue.


I don't know whether that causes your specific deadlock, but wezterm does definitely (and unfortunately) make the primary mistake when using the ConPTY API: It fails to read from the output pipe when calling ClosePseudoConsole. This is also documented here: https://learn.microsoft.com/en-us/windows/console/creating-a-pseudoconsole-session#preparing-the-communication-channels

To reproduce the issue, run a child process inside your shell (for instance WSL), print a ton of text in large chunks (for instance with cat = chunk size 128KiB) and close the tab. The deadlock inside OpenConsole then looks like this:

image

This is one of the reasons why I've chosen to remove the blocking wait from the ClosePseudoConsole API. Starting Windows 26100 (Windows 11 24H2) it'll now instantly return without waiting for the PTY to exit. (This reminds me that I need to update the winconpty package to also use that new, safer behavior.)

My suggestion for wezterm is that it reads from the output pipe until ReadFile either returns FALSE or lpNumberOfBytesRead is zero. I recommend against testing for an ERROR_BROKEN_PIPE error, because for regular pipes that's the same anyway, but for special pipes that support "graceful shutdown" (like TCP's FIN/ACK), ReadFile will return TRUE while lpNumberOfBytesRead is zero. This still means that the other side wants to end things.

With the next version of ConPTY at the end of this month, the new, safer API will be available. It'll allow you to safely call ClosePseudoConsole on the UI thread and/or in between Read/WriteFile calls. Even better, it'll support overlapped IO, which may be helpful for wezterm, considering that it uses poll() for UNIX as well, from what I can tell. That way both implementations can use async IO with the PTY.

P.S.: I just noticed that wezterm calls it "psuedo". Not just the flag, but also the entire file ("psuedocon.rs"). πŸ˜…

FrancescElies commented 3 months ago

Thanks for your quick reply.

I’m a regular Wezterm user, what i noticed is that ClosePseudoConsole is blocking indefinitely (definitely more than 3 seconds).

Next week I should be able (hopefully will be easy) to compile conpty.dll from this repo from the main branch and reproduce the test wit β€˜cat’ and closing the term.

Psuedocon typo i can report it back to wezterm devs, doing a pr with the rename might be received as inappropriate.

Regarding your reafile suggestion sounds something like I could try to make a patch for but apart from referring tour comment i am not sure i will be able to skillfully respond any questions reviewers might ask.

PS: how did you catch de deadlock stack? With procdump, could you give some details on that?

lhecker commented 3 months ago

Next week I should be able (hopefully will be easy) to compile conpty.dll from this repo from the main branch [...]

If you hit any problems and got questions, please let me know! I'll try to get the ConPTY API updated by then.

[...] but apart from referring tour comment i am not sure i will be able to skillfully respond any questions reviewers might ask.

If the reviewers have any questions specific to ConPTY and you aren't sure about, please feel free to ping me. πŸ™‚ (I'm familiar with Rust so that shouldn't be a problem either. πŸ˜…)

PS: how did you catch de deadlock stack?

That's Process Hacker, now renamed System Informer (probably to avoid getting banned by non-IT admins). It's similar to Microsoft's own Process Explorer, but in my opinion it's better because I find it visually more pleasing, it has more features, and it's open source on GitHub.

Stacks can be observed by double-clicking any application, clicking on "Threads", sorting by TID (so that their order is stable and stops changing), and click each individual thread. System Informer will automatically download PDBs and resolve symbols for you.

image

Of course, this only works for basic triage, but in my experience that's often enough to get an initial idea of the problem.

That aside, if you open its Options dialog, you'll find a ton of features that you can optionally enable. I wouldn't recommend enabling them outright, but I think it's worth toying around with them. Something I use for basic testing of our text renderer for instance, is under "ExtendedTools". There, you can check "Enable FPS monitoring" and you'll be able to see DWM present statistics for each individual application.

All of these are also super useful:

image

These are great:

image

You can alternatively also click this button:

image

These are completely nuts:

image

None of this requires a kernel driver, and it really only scratches the surface. System Informer is one of the greatest Windows apps I know.

FrancescElies commented 3 months ago

Leonard thanks for the crash course on Process Informer, in the passt I installed but felt overwehlmed the the amount of options and I used to use only find handles or dlls, you convinced me to install it again and invest some time with it.

I am trying to give you some more info to make sure what we see it's the same issue you described, today it hanged again and decided to collect infos the way you showed me.

The main problem was identifying which OpenConsole to inspect, how do I know?. As you can see I started wezterm from windows term to see logs and had several sessions open. image

Since I did not now which OpenConsole to inspect I decided to create a dump for each of them and wezterm too, in this case with procdump and a bit of shell ps | find opencons | each { procdump -ma $in.pid} and inspected these files with windbg.

image

Out of those dumps I could see wezterm-gui image

Openconsole (most of them having a similar stack) image

Except one, which was showing something with ticket_lock, I suspect this was the one hanging?πŸ’₯ image

Observations

Next

PS: I am happy to learn better ways of doing things, please feel free to suggest me at any time how you would do it if you know a nicer way.

lhecker commented 3 months ago

When you check out these older OpenConsole versions (prior to #17510), you need to check the stack of the RenderThread::s_ThreadProc thread. That's the thread which calls WriteFile on the output pipe that wezterm is supposed to read from. While it's writing into the pipe it's holding the ticket lock and if no one reads from the pipe it holds the lock forever.

If I'm correct, then that's why the ConsoleIoThread (= handles incoming API calls) is stuck waiting to acquire the lock.

FrancescElies commented 2 months ago

building conpty.dll

I will need some help building the latest conpty.dll, I read building.md and ran Invoke-OpenConsoleBuild

I got some errors

"C:\s\oss\terminal\OpenConsole.sln" (default target) (1) ->
"C:\s\oss\terminal\src\tools\TerminalStress\TerminalStress.csproj" (default target) (114) ->
  C:\s\oss\terminal\src\tools\TerminalStress\TerminalStress.csproj : error NU1101: Unable to find package Microsoft.Wi
ndowsDesktop.App.Ref. No packages exist with this id in source(s): C:\Program Files\dotnet\library-packs, TerminalDepe
ndencies
  C:\s\oss\terminal\src\tools\TerminalStress\TerminalStress.csproj : error NU1101: Unable to find package Microsoft.NE
TCore.App.Ref. No packages exist with this id in source(s): C:\Program Files\dotnet\library-packs, TerminalDependencie
s
  C:\s\oss\terminal\src\tools\TerminalStress\TerminalStress.csproj : error NU1101: Unable to find package Microsoft.As
pNetCore.App.Ref. No packages exist with this id in source(s): C:\Program Files\dotnet\library-packs, TerminalDependen
cies

image

Despite the errors I got a conpty.dll built.

❯ : fd -HI conpty.dll
bin\x64\Debug\conpty.dll
obj\x64\Debug\winconpty.DLL\
obj\x64\Debug\winconpty.DLL\conpty.dll.recipe
obj\x64\Debug\winconpty.DLL\winconpty.DLL.tlog\
obj\x64\Debug\winconpty.DLL\winconpty.DLL.tlog\winconpty.DLL.lastbuildstate

terminal on ξ‚  main [?] via .NET v8.0.303
❯ : sigcheck -nobanner -q -n bin\x64\Debug\conpty.dll
n/a

Sadly I can't see it's version, I guess because I am building directly from master.. I will put that dll into wezterm and and modify wezterm to call ReadFile before closing, let's see how that one works out.

cat bigfile experiment

I tried to cat and close a pane with a 1MB file big, yes the term blocked but eventually it got closed as soon as cat was done doing it's thing.

lhecker commented 2 months ago

I will need some help building the latest conpty.dll, I read building.md and ran Invoke-OpenConsoleBuild

Despite the name, that command builds the entire solution including Windows Terminal and so it takes a long time to complete. It'll also build in Debug mode by default. Here's a command to compile it in Release mode and to only compile the two projects that are needed:

Invoke-OpenConsoleBuild '-p:Configuration=Release' '-t:Conhost\Host_EXE,Conhost\winconpty_DLL'

You can also compile it from within Visual Studio if you open the solution: image

The output will be in bin\x64\Release.

Sadly I can't see it's version, I guess because I am building directly from master..

Yes, we add the version in our build pipelines as a separate step.

I will put that dll into wezterm and and modify wezterm to call ReadFile before closing, let's see how that one works out.

If I'm correct with my previous assumption, you don't need this new version to fix this issue in wezterm. The issue is merely that it needs to continue reading from the output pipe until after ClosePseudoConsole has returned. The new version is simply faster and more robust, but it's not required to fix this.

Also, yesterday I realized that wezterm can simply close/drop the output pipe handle before calling ClosePseudoConsole. That would also fix the issue.