Closed oshaboy closed 1 year ago
To clairfy I understand if the "round trip safe unambiguous translation of basic tokens" is out of scope for the font. I just wanted to ask.
The recommended way to represent these are with the ASCII characters and zero-width joiners. So 0xEC would be represented with U+0047+200D+004F+200D+0020+200D+0054+200D+004F. The ZWJs make this round-trippable and since it is zero-width it already appears correctly.
I see, thanks for the clarification
The ZX80/81 and Spectrum have special character codes in their character sets for entire basic tokens in order to save some of their limited memory. So there were character codes for full strings like
IF
,FOR
,GO TO
etc. Of course those can be encoded as just a sequence of ASCII characters, but then there is no way to distinguish between0xEC
and the sequence0x47 0x4f 0x20 0x54 0x4f 0x20
in the translated output making the 2 way translation impossible.Many BASIC dialects used the extended characters for tokens, but not in a way so fundamental to the way the Computer displayed characters like the
Sinclair machinesZX Spectrum. Even in machine code programs if you tried displaying the character0xEC
on the screen it would instead display the stringGO TO
. (Edit: Turns out the ZX81 doesn't)Considering the font supports so many "Legacy Computing" symbols, including ones that aren't supported by unicode. I think this is feasible.