Open savetheclocktower opened 2 months ago
I spent 45 minutes getting this working on Windows and it's driving me nuts.
First: stdio: 'inherit'
doesn't work any better on Windows when we call spawn
instead of spawnSync
, so instead I've switched it to stdio: 'pipe'
and hooked up the pipes. This works great! We get output streaming and everything.
Now there's one more headache: character encoding.
With this PR checked out and ATOM_DEV_RESOURCE_PATH
set locally…
$env:ATOM_DEV_RESOURCE_PATH = "C:\Path\To\Pulsar"
I can run pulsar --dev --package list
in Windows Terminal and see all the pretty tree-like output of that command get mangled. Intuitively, I understand why this happens: all Windows terminals and shells inexplicably (well, probably for backwards compatibility) use Code page 437. Anyone can change their terminal's encoding to UTF-8 via chcp 65001
, but we can't make our users do that.
The headaches only start when you realize that ppm list
works just fine! All of our weird line-drawing characters are written to stdout as intended. It's only when we add the layer of indirection via child_process.spawn
— or spawnSync
, or exec
, or cross-spawn — I just tried all of them! — that things go sideways.
The one thing I've done that's actually solved this is something that I think is just a bit too clever: alter pulsar.cmd
so that it remembers the user's current character set, calls chcp 65001
at the top of the script, then changes it back to the original value just before we exit. This would be a great solution if we could guarantee that the cleanup code would run no matter what — but we can't, because batch files don't really have an equivalent for trapping SIGINT
like shell scripts can.
So here are some options, in no particular order:
ppm list
to use only the characters that are truly portable across common terminal character encodings.Other suggestions are welcome.
I did some experiments that clarified my thinking here.
My initial theory was that both pulsar
and ppm
were able to output UTF-8 properly into a CP437 console, but that the combination of the two — shelling to ppm
from inside pulsar
— was screwing things up.
I tested this with a small Node project with various scripts that log the text ├──
to stdout:
test.js
script that logs directly,test.cmd
script whose job is to call node.exe test.js
,wrapper.js
script that calls test.js
via child_process.spawn
, andwrapper.cmd
script whose job is to call node.exe wrapper.js
.I could not get the encoding bug to manifest in any of these scenarios.
Running out of ideas, I started calling console.log("├──")
or echo ├──
at pretty much every step in the process: .cmd
files, JS files, and so on. My new theory is that Node handles this correctly (magically ensuring that its UTF-8 output is rendered properly in a CP437 console), but Electron does not. If I insert console.log("├──")
somewhere in start.js
, the encoding is wrong in the output, even if I never shell out to ppm
.
That said, I'm failing in my vague attempts to determine why this is true. If there were some explicit trick that Node were using, it would likely be visible in the Node source code, but I'm not seeing it. Maybe we'll get lucky and find out that this has been fixed in a future version of Electron, but for now I can think of only two ways to work around the problem:
pulsar.cmd
— but a way that won't leak side-effects no matter what.ppm
to symbols that are safe to rely on even if we don't know the encoding of the terminal.Option 2 is easy enough (if unsatisfying), but I think we can manage option 1.
It's true (as I said above) that there's no easy way to trap a Windows shell's equivalent of SIGINT. If pulsar.cmd
calls chcp 65001
before it calls Pulsar.exe
, then we need a way to run some code after that's done, no matter what happens with that call — even when it errors or the user cancels it with Ctrl+C.
But there are theoretical ways around this, depending on how ridiculous you're willing to get! I'll try two of the approaches described in this Stack Overflow answer and let you know how it goes.
OK, it's even weirder.
Let me start over and state a hypothesis:
Node for Windows (at least as of v16, which
ppm
uses) has some magic way of ensuring that either (a) what it logs to stdout (which is generally assumed to be UTF-8 in all cases) has its encoding converted to match the default CP437 encoding of a Windows terminal; or (b) the terminal itself is temporarily changed to a UTF-8 code page, then reliably switched back after a script exits.Electron does not have this magic — either because it's lacking altogether, or because there's an extra layer of abstraction that gets in the way of the magic behavior.
Let's take the character ├
as an example. If my terminal is expecting UTF-8, I can call console.log("├")
and everything works. But if it's expecting CP437, I instead see Γö£
. That's what you'd expect if a terminal were interpreting UTF-8 text as CP437: instead of one character, you get three.
Why three? Because the UTF-8 code point for ├
is 0x251C
. That becomes the raw bytes e2
, 94
, and 9C
via an algorithm I don't understand. But if those three bytes are interpreted as separate characters (as they would be if no conversion were taking place), they'd be the characters Γö£
.
Again, all this is exactly what you'd expect in this situation. Node's behavior is the unusual part! Either it's doing behind-the-scenes conversion of characters or it's telling the terminal to use UTF-8 (but only for the duration of a given command).
When we run pulsar
and ask it to write to STDOUT, what we're getting from it is basically main-process logging. On a hunch, I started a new Electron boilerplate project and did console.log("├──");
in the entry point script; it output Γö£ΓöÇΓöÇ
into my terminal.
I had thought that our last-ditch option here would be to make sure we were emitting only characters that were mentioned in this guide, but this guide isn't what we need, because it assumes that encoding conversions are being done correctly. The box-drawing symbols we're using for ppm list
do exist in CP437, but at different code points. (Obvious in retrospect, since the code points used for the UTF-8 versions are multibyte and thus can't exist in CP437.)
Now I think our last-ditch option is to set an environment variable that indicates a Windows user's code page. If it's not 65001, ppm list
could detect that and respond with plaintext.
An aside on the batch file exception thing: as expected, it was more complicated than it seemed. The best I could do was to copy this approach, but the annoying "Terminate batch job? (Y/N)" prompt was my undoing. This approach means that the prompt is shown twice in a row, and the cleanup code only happens if you answer "Y" to the first prompt and "N" to the second.
Anyway, a couple more thoughts:
pulsar --package
output on Windows had been working correctly. I thus don't see this as a blocker, especially considering it really only affects non-ASCII output in ppm
, of which ppm list
might be the only common example. Hence I think I'll take this out of draft now.pulsar --package
handling to the scripts (pulsar.cmd
and pulsar.sh
) instead of making it the responsibility of the Electron main process, then that'd be the slam-dunk best option all around.
This is a concept for how we'd rewrite argument parsing to make
--package
work better and fix the issues detailed in #1053.I haven't tried any of this on Windows yet — nor have I written new specs or tried to run existing specs — but this passes some sanity tests on my macOS machine:
--version
is handled properly;pulsar --version
does what it always did, whereaspulsar --package --version
properly forwards toppm --version
.ppm
is moved tostart.js
so that it can usespawn
instead ofspawnSync
. This might be enough on its own to fix our issues with{ stdio: 'inherit' }
on Windows; but if it isn't, then it at least gives us more flexibility with using{ stdio: 'pipe' }
to produce the same effect.If I take this out of draft, it means that I've confirmed that these techniques fix our Windows issues, ensured existing specs pass, and written new specs for
--package
behavior.