Open j4james opened 3 years ago
This may seem like a contrived example, and it is, but it's based on a real issue I had when testing my sixel branch with the sixel-bench benchmark. Much like the above animation, it caused the console to hang for a couple of seconds, and eventually just displayed the final frame of the video.
I suppose sixel-bench is really a contrived example too, so not an urgent problem (especially since we don't have sixel yet), but this definitely does feel like a bug - I don't think it should be possible to hang the console like that.
@j4james this issue could be Windows version specific. You're using Windows 10. On version 10.0.2200 (Windows 11) after setting Windows Console Host
as the default terminal application in SUI, and WT restart, this is the output:
https://user-images.githubusercontent.com/3933920/142770292-84d4fdd3-45d9-4fb5-99be-7a8cbc723e6b.mp4
WT version 1.11.2921.0. Same result when WT is the default terminal application. This post is for reference only!
Huh. That's weird. I wouldn't expect the Windows version to have any effect, but I wonder if it has something to do with the version of WSL you're using (I'm on v1). This test relies on Python passing the write
call through to conhost as a single output operation. I've tested both Python 2 and 3, so the Python version shouldn't be a factor, but it's possible the WSL version may affect things.
I'll try and rewrite the test case in C, with a direct call to WriteConsole
, which should guarantee the buffer won't be split up by some intermediate code.
Thanks for attempting to replicate the issue though.
Here's the equivalent test case rewritten in C++ with a WriteConsole
call. But it'll now need to be run from a Windows shell like cmd
or PowerShell
.
#include <windows.h>
#include <string>
#include <cmath>
void main()
{
HANDLE handle = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD mode;
GetConsoleMode(handle, &mode);
mode |= ENABLE_VIRTUAL_TERMINAL_PROCESSING;
SetConsoleMode(handle, mode);
CONSOLE_SCREEN_BUFFER_INFOEX csbie;
csbie.cbSize = sizeof(csbie);
GetConsoleScreenBufferInfoEx(handle, &csbie);
const auto w = csbie.dwSize.X;
const auto h = csbie.srWindow.Bottom - csbie.srWindow.Top + 1;
const auto mx = w/2;
const auto my = h/2;
const auto pi = 3.14159265358979323846;
printf("Generating content...\n");
std::string s = "\033[?25l";
for (auto i = 0; i < 512; i++) {
s += "\033[H";
for (auto y = 0; y < h; y++) {
for (auto x = 0; x < w; x++) {
auto dy = y-my;
auto dx = x-mx;
auto a = std::atan2(dy*2,dx) + pi;
auto c = std::to_string((int(a/pi*127)+i)%256);
s += "\033[38;2;"+c+";"+c+";"+c+"m*";
}
}
}
s += "\033[?25h\033[m\033[H";
printf("Starting animation...\n");
DWORD written = 0;
WriteConsoleA(handle, s.c_str(), s.length(), &written, NULL);
}
@j4james this is the output with the exe version of rotate in 1st tab, and python/wsl in 2nd tab:
https://user-images.githubusercontent.com/3933920/142777127-03c87210-451f-48e9-b9f8-8bd59653d9a9.mp4
Changing default terminal application
setting did not make any difference in any tab.
@j4james when running the python script in pwsh, it works but rendering seems slower compared to wsl test:
https://user-images.githubusercontent.com/3933920/142780410-9fd0496c-7775-4455-8f77-b6b2ae83b0cc.mp4
The same size = ['30', '120']
was used as in the wsl script.
I should have mentioned I didn't expect to see the problem on the Windows version of Python, because I know it handles I/O differently to Linux, and the output gets split into chunks. But judging from your results, it looks like the same thing can apply to Linux.
At least the C++ version confirms the problem, although your system is clearly a lot faster than mine. so you're not seeing much of a delay before it renders the final frame.
@j4james what is the difference between printf
and writeconsole
? This is what I've changed after line 38:
printf("Starting animation...\n");
printf("%s", s.c_str());
and this is the output:
https://user-images.githubusercontent.com/3933920/142925001-9f36aa29-3356-4642-9323-994fe58a86d9.mp4
You know, I think @lhecker actually fixed this recently, but I'd assume the fix isn't in 10.0.19041.1348
Yeah I had regular freezes in WT of up to 30 seconds, when running the application at high frame rates.
Since it was a heisenbug and only occurred when you weren't observing it (with a debugger) I could never actually proof the reason for the issue. But I believe I could determine it regardless: Locks like SRWLOCK are unfair locks and under certain circumstances it can prevent threads from progressing for a long time.
Based on this theory I changed the buffer lock in WT into a til::ticket_lock
and the issue hasn't occurred for me since.
Integrating til::ticket_lock
into conhost could be slightly more difficult as it uses critical sections and might rely on its support for recursive locking.
You need to test with the C++ example - the Python version doesn't seem to work for everyone (I'm guessing because you're all on a later version of WSL to me). I can reproduce this with a recent version of conhost built from source and a recent version of Windows Terminal built from source.
what is the difference between
printf
andwriteconsole
?
@elsaco printf
is going to build up the output string in a temporary buffer, which it will periodically flush when full. That results in a bunch of small WriteConsole
calls instead of one big one, which defeats the purpose of the test.
Btw, this is what it looks like for me in OpenConsole and Windows Terminal, both built from commit a68b0d4f08e1e8ffd87d73776ed83d45fec16042.
https://user-images.githubusercontent.com/4181424/142938681-57f0cace-f78e-4767-9955-6d67c7232970.mp4
@j4james if the same rotate.exe
is run from a WSL prompt the delay is clearly visible. Also, please notice how the last frame looks. There's about 4-5 sec between Starting animation...
and end of test:
https://user-images.githubusercontent.com/3933920/142979310-4bc162e6-6e3a-423b-9c53-0a82cc80f849.mp4
After moving LockConsole
/ UnlockConsole
into WriteConsoleWImplHelper
and limiting the chunk size passed to DoWriteConsole
to 2kB:
https://user-images.githubusercontent.com/2256941/143775617-ba4cd95c-579f-4ccc-accb-6b31e98beeef.mp4
After additionally replacing the _csConsoleLock
critical section with:
static thread_local ULONG recursionCount = 0;
static til::ticket_lock lock;
...and increasing the chunk size to a more reasonable 128kB (since the ticket lock is now being "fair"):
https://user-images.githubusercontent.com/2256941/143775679-60d160de-9089-44ab-8b93-648661b2c678.mp4
Unfortunately however, chunking the input is likely not a viable option, as some applications rely on the implicit promise that a single write() with VT is processed synchronously. For instance:
https://user-images.githubusercontent.com/2256941/143776195-fc721a4b-816e-4f5c-a1ad-50a210982a85.mp4
Unfortunately however, chunking the input is likely not a viable option, as some applications rely on the implicit promise that a single write() with VT is processed synchronously.
That's a good point. So maybe this is just the caller's responsibility then, and the issue could be closed as "won't-fix". My main concern was the failure of sixel-bench, but it looks like I'm the only own that can reproduce the issue in Python (which is what sixel-bench is using), so it's likely that could just be resolved by an update of Windows or WSL.
@j4james But at least one improvement will result out of this: As you can see in my video above, even 2kB chunks of data can cause the output to lag in conhost (and ConPTY) if an application is writing VT sufficiently fast. By switching to a fair mutex (til::ticket_lock
) we can improve the user experience. I'm going to submit a PR for that soon. 🙂
The performance hit of that seems to be less than 1.5% in my benchmarks.
By switching to a fair mutex (
til::ticket_lock
)
Just be aware that there might be a few places in conhost where the lock is acquired recursively (I'm fairly certain I've come across some in the past). I think I remember you saying that the ticket_lock
wasn't re-entrant, which I assume would be problematic.
I don't think it would be that difficult to refactor the code to avoid recursive locks, but it might be a bit of work trying to track down all the cases that need to be dealt with.
I don't believe it's all too difficult to make a mutex save for reentrancy. My approach is basically:
static thread_local ULONG recursionCount = 0;
static til::ticket_lock lock;
void LockConsole() {
const auto rc = ++recursionCount;
if (rc == 1) {
lock.lock();
}
}
void UnlockConsole()
{
const auto rc = --recursionCount;
if (rc == 0) {
lock.unlock();
}
}
It's missing safety-checks for over- and underflow, but otherwise it works fine.
I just randomly came across this issue while searching for sixels of all things. I think we can close this now, in particular with the new passthrough support. BTW the animation from the C++ snippet now works perfect (again?)!
@lhecker This still hangs for me in conhost, which it what the issue was originally about. And while it's improved in Windows Terminal, it still has a significant delay before you see anything (assumedly because conhost has to process the buffer first). This is most obvious when running the sixel-bench test.
Windows Terminal version
n/a
Windows build number
10.0.19041.1348
Other Software
No response
Steps to reproduce
The Python example below does not reproduce the issue reliably on all systems.
This constructs a little VT animation which it outputs with a single
write
call.Expected Behavior
You should see a rotating pattern, a bit like a radar scan. This is what it looks like in MinTTY (using wsltty):
https://user-images.githubusercontent.com/4181424/142741894-fb9637da-6046-4c7b-b4cb-6e0e2efc583f.mp4
You'll see the same sort of thing in XTerm, VTE, Alacritty, Kitty, etc.
Actual Behavior
Conhost displays the first couple of lines of the first frame (if anything), then hangs for a couple of seconds, and eventually just shows the final frame.
Windows Terminal is a little better, in that it doesn't hang, but you still don't see the animation - just the final frame.
I believe the problem is that the buffer is locked for the entire time that the VT parser is processing a write operation, and the render thread can't render anything when it can't obtain the lock. If the write buffer is particularly large (as is the case here), then the renderer may be blocked for quite a long time.