Windows renders Unicode symbols as question marks in terminal when default codepage is active

I'm booted into Windows to do a Unicode bug bash on that platform. For those not in the know Windows support for Unicode is basically the worst of all three major OSs and Windows-specific Unicode bugs are practically guaranteed. Here is one.

On Windows, open the Windows Terminal app (or probably any CLI app will work). Download the latest TLA+ tools build and run java -cp tla2tools.jar tla2sany.SANY Test.tla to parse this file:

---- MODULE Test ----
ASSUME ℕ
====

This will generate the following output:

****** SANY2 Version 2.2 created 08 July 2020

Parsing file C:\Users\ahelw\src\java-tools\tlatools\org.lamport.tlatools\Test.tla (file:/C:/Users/ahelw/src/java-tools/tlatools/org.lamport.tlatools/./Test.tla)
Parsing file C:\Users\ahelw\AppData\Local\Temp\tlc-3969957912596007004\Naturals.tla (jar:file:/C:/Users/ahelw/src/java-tools/tlatools/org.lamport.tlatools/dist/tla2tools.jar!/tla2sany/StandardModules/Naturals.tla)
Semantic processing of module Naturals
Semantic processing of module Test
Semantic errors:

*** Errors: 1

line 3, col 12 to line 3, col 12 of module Test

Unknown operator: `?'.

Following the recommendation here, we can change the default Windows codepage (850) to 65001 with the command chcp 65001. Upon executing that command the Unicode symbol will render correctly:

> chcp 65001
Active code page: 65001
> java -cp .\dist\tla2tools.jar tla2sany.SANY .\Test.tla

****** SANY2 Version 2.2 created 08 July 2020

Parsing file C:\Users\ahelw\src\java-tools\tlatools\org.lamport.tlatools\Test.tla (file:/C:/Users/ahelw/src/java-tools/tlatools/org.lamport.tlatools/./Test.tla)
Parsing file C:\Users\ahelw\AppData\Local\Temp\tlc-4161744943301846090\Naturals.tla (jar:file:/C:/Users/ahelw/src/java-tools/tlatools/org.lamport.tlatools/dist/tla2tools.jar!/tla2sany/StandardModules/Naturals.tla)
Semantic processing of module Naturals
Semantic processing of module Test
Semantic errors:

*** Errors: 1

line 3, col 12 to line 3, col 12 of module Test

Unknown operator: `ℕ'.

There isn't really anything we can do about this issue so I am just going to file it then close it in case anybody else encounters & searches for it. According to one of the answers in the linked SuperUser StackExchange post, Windows is experimenting with a beta "Use Unicode UTF-8 for worldwide language support" setting in the language settings menu which also fixes this issue. Perhaps this will become the default one day and then this particular problem will go away for everybody. Enabling that beta setting also seems to work to fix this issue.

tlaplus / tlaplus

Windows renders Unicode symbols as question marks in terminal when default codepage is active #1003