Open TheCloudlessSky opened 1 year ago
Thanks for this feature request! The level of detail and code inspection is very appreciated.
If you are viewing this issue and would like to indicate your interest, please use the 👍 reaction on the issue description to upvote this issue. We also welcome additional use case descriptions. Thanks again!
@TheCloudlessSky What version of PowerShell are you using?
@kmoe v5 but I've also confirmed just with cmd and PowerShell 7.3.
Unfortunately it seems I didn't retain good enough notes about everything I was referring to when making the change that caused this problem, but we made these changes in response to recommendations from Microsoft and so expected that they were good advice. I'm sorry that turns out not to be true. :confounded:
One reference I do still have a link to from my notes is Classic Console APIs versus Virtual Terminal Sequences. That document is primarily concerned with the other big thing we changed at the same time -- activating virtual terminal processing instead of using the legacy console API -- but it does still briefly touch on UTF-8 support:
Unicode
UTF-8 is the accepted encoding for Unicode data across almost all modern platforms, as it strikes the right balance between portability, storage size and processing time. However, Windows historically chose UTF-16 as its primary encoding for Unicode data. Support for UTF-8 is increasing in Windows and use of these Unicode formats does not preclude the usage of other encodings.
The Windows Console platform has supported and will continue to support all existing code pages and encodings. Use UTF-16 for maximum compatibility across Windows versions and perform algorithmic translation with UTF-8 if necessary. Increased support of UTF-8 is in progress for the console system.
UTF-16 support in the console can be utilized with no additional configuration via the W variant of all console APIs and is a more likely choice for applications already well versed in UTF-16 through communication with the
wchar_t
and W variant of other Microsoft and Windows platform functions and products.UTF-8 support in the console can be utilized via the A variant of Console APIs against console handles after setting the codepage to 65001 or
CP_UTF8
with theSetConsoleOutputCP
andSetConsoleCP
methods, as appropriate. Setting the code pages in advance is only necessary if the machine has not chosen "Use Unicode UTF-8 for worldwide language support" in the settings for Non-Unicode applications in the Region section of the Control Panel.
For this section in particular, I will concede that we don't seem to be following this recommendation fully, although I'm also not sure this section is making a single clear recommendation, as opposed to just gesturing vaguely at a handful of different options.
Specifically, we are using SetConsoleOutputCP
to change the console's "legacy codepage", but I believe all subsequent writes to the console are effectively using the W variants of the file I/O APIs, although we're doing that only indirectly through the Windows implementation of Go's os
package.
Based on what I remember about these parts of the Windows API (my memory is spotty, since it's been a long while since I did real Windows API dev), only the A variants of the Win32 API functions are concerned with legacy codepages, and so it's quite possible that we don't actually need to change the legacy code page of the console now that modern Windows versions have a fully-unicode-aware text buffer. Older versions of Windows -- pre-Windows 10 -- could only retain in the console buffer characters from the currently-selected legacy codepage, but this documentation suggests that was fixed as part of all of the recent console modernization work.
I think the best next step here then would be to try removing the SetConsoleOutputCP
call altogether and then encourage Terraform to write some astral plane characters like emoji to its output, and see if my hunch is correct that the modernized text buffer will retain the emoji characters even with the console codepage set to 437.
The following should test it, I think:
chcp
.check
, or precondition
, or similar whose error_message
includes an emoji character or other astral plane character.terraform plan
and inspect the rendered diagnostic. If the astral plane character appears correctly in the terminal, this test is successful and therefore we should be able to safely remove the SetConsoleOutputCP
call.I'd note that this will probably still cause Terraform to leave the terminal in Virtual Terminal Processing mode after it exits, which is not ideal, but hopefully still less troublesome than changing the console's legacy codepage.
I don't have a functioning Windows system handy to test this on right now, so I can't try this myself immediately, but hopefully the above is useful to someone who picks up this issue to work on in the future.
Terraform Version
Terraform Configuration Files
N/A. Simply running the CLI will cause this issue.
Debug Output
N/A. Simply running the CLI will cause this issue.
Expected Behavior
Running the terraform CLI shouldn't break other programs in the same console on Windows.
Actual Behavior
There was work done in https://github.com/hashicorp/terraform/issues/18242#issuecomment-759846280 about 2.5 years ago to re-do how Terraform handles reading/writing to the console on Windows so that things like colors would work with Unicode characters, etc.
We recently upgraded our CLI to 1.5.4 -- a version that includes the previously mentioned fixes and our deployment pipeline is now failing. Our deployment pipeline builds our application and also runs Terraform to initialize, validate, plan, and apply changes.
We have a part of our application that launches a separate process, writes to its standard input, and reads back the standard output. We have tests that confirm this always works. Since upgrading the CLI, our tests for launching this process are now failing. We eventually tracked it down that the process was not processing the standard input we were sending. We couldn't reproduce this when running our tests in Visual Studio, couldn't reproduce when we ran just our tests via console runners, but could reproduce this when doing our full deployment. And once our deployment failed, we couldn't get our tests to run via console runners again -- everything about the console seemed broken!
The crux of this problem is that Terraform is changing the code page of the console to UTF-8 in an attempt to use a single change to have a large impact. This is a bad idea on Windows because:
For example, opening a new PowerShell console and running these commands:
The default code page for me on Windows using the English United States locale is CP-437. You can see that Terraform changes it to 65001 (UTF-8).
This can affect other processes launched from the console after Terraform has run. For example:
Example C program:
target.c
NOTE: We actually use third-party binaries in practice, but this is a small example of how to reproduce the bug.
Compile the above:
Example C# source program:
Source.cs
Compile the above using .NET Framework 4.8 (not .NET Core/.NET 6, I believe this issue is worked around).
Test:
This has the same output on Windows 10 and Windows 11.
I personally believe that it shouldn't be Terraform's responsibility to change configuration about the current console, especially the code page (as I mentioned, it's typically misinformed/bad advice). The gist for a fix is that Terraform should most likely be using Window's Console APIs instead of File APIs for reading/writing text to the console. Here's how Python's CLI handles this: https://github.com/python/cpython/blob/5141b1ebe07ad54279e0770b4704eaf76f24951d/Modules/_io/winconsoleio.c
As a temporary work around/fast fix, Terraform could set the code page and then reset it once the CLI exits?
Steps to Reproduce
chcp 437
terraform --version
chcp
-- prints65001
Additional Context
No response
References
No response